arxiv: 2604.24332 · v1 · submitted 2026-04-27 · 💻 cs.LG · cs.CR

Recognition: unknown

Mitigating Error Amplification in Fast Adversarial Training

Mengnan Zhao , Lihe Zhang , Bo Wang , Tianhang Zheng , Hong Zhong , Geyong Min

Authors on Pith no claims yet

Pith reviewed 2026-05-08 04:11 UTC · model grok-4.3

classification 💻 cs.LG cs.CR

keywords fast adversarial trainingcatastrophic overfittingadversarial robustnessdynamic guidanceconfidence modulationrobustness accuracy trade-offerror amplification

0 comments

The pith

Dynamically scaling perturbations by sample confidence mitigates catastrophic overfitting in fast adversarial training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that low-confidence samples drive both catastrophic overfitting to the training attack and the growing drop in clean accuracy as robustness demands increase in fast adversarial training. By analyzing how guidance strength interacts with confidence groups, it introduces Distribution-aware Dynamic Guidance to reduce perturbation magnitude for uncertain examples and soften supervision when predictions are unreliable. A weighted regularization term stabilizes the resulting gradients. This matters because fast adversarial training has been limited by the need to choose between speed, robustness to unseen attacks, and usable clean performance. If the mechanism holds, it allows practitioners to obtain more reliable robust models without extending training time.

Core claim

Low confidence samples are the primary contributors to catastrophic overfitting and the robustness accuracy trade-off in fast adversarial training. The Distribution-aware Dynamic Guidance strategy scales the perturbation budget according to each sample's confidence in the ground-truth class, preventing spurious correlations, while adjusting the supervision signal based on prediction state to avoid reinforcing errors; a weighted regularization constraint further prevents gradient instability from these dynamic changes.

What carries the argument

Distribution-aware Dynamic Guidance (DDG) strategy that dynamically adjusts both perturbation magnitude and supervision signal according to per-sample confidence at the ground-truth class.

If this is right

DDG reduces catastrophic overfitting while maintaining higher clean accuracy across standard image classification benchmarks.
The robustness-accuracy trade-off becomes less severe as the perturbation budget increases.
Weighted regularization prevents gradient instability that would otherwise arise from dynamic guidance.
Samples are guided toward consistent decision boundaries that generalize beyond the specific training attack.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same confidence-based modulation might improve other accelerated adversarial training methods that currently suffer from error amplification.
Combining DDG with augmentation techniques that shift confidence distributions could yield further gains on larger models or datasets.
The approach suggests a general principle for controlling guidance strength in any loss that involves adversarial perturbations.

Load-bearing premise

Low confidence samples are the primary contributors to CO and the robustness accuracy trade-off, and dynamically scaling perturbation and supervision according to confidence will consistently guide samples toward better decision boundaries without introducing new instabilities.

What would settle it

An experiment in which low-confidence samples are removed or up-weighted during standard fast adversarial training and catastrophic overfitting still occurs at the same rate, or in which DDG is applied but robustness to unseen attacks shows no improvement over the baseline.

Figures

Figures reproduced from arXiv: 2604.24332 by Bo Wang, Geyong Min, Hong Zhong, Lihe Zhang, Mengnan Zhao, Tianhang Zheng.

**Figure 1.** Figure 1: CO analysis on CIFAR10 and ResNet18. Each training view at source ↗

**Figure 2.** Figure 2: Trade-off analysis on CIFAR-10 with ResNet-18. PGD and C view at source ↗

**Figure 3.** Figure 3: Supervision strength analysis on CIFAR-10 with ResNet-18. PGD and C view at source ↗

**Figure 4.** Figure 4: The impact of τ1 in Eq. (5) and λ in Eq. (8) on FAT performance. der stable training, clean accuracy increases as τ1 grows, while robust accuracy gradually decreases. Conversely, as λ increases, the clean accuracy declines whereas the robust accuracy improves. Notably, the influence of these hyperparameters on overall performance is modest: clean, PGD, and C&W accuracies vary within 83% ± 0.5, 60% ± 0.5, … view at source ↗

read the original abstract

Fast Adversarial Training (FAT) has proven effective in enhancing model robustness by encouraging networks to learn perturbation-invariant representations. However, FAT often suffers from catastrophic overfitting (CO), where the model overfits to the training attack and fails to generalize to unseen ones. Moreover, robustness oriented optimization typically leads to notable performance degradation on clean inputs, and such degradation becomes increasingly severe as the perturbation budget grows. In this work, we conduct a comprehensive analysis of how guidance strength affects model performance by modulating perturbation and supervision levels across distinct confidence groups. The findings reveal that low confidence samples are the primary contributors to CO and the robustness accuracy trade off. Building on this insight, we propose a Distribution-aware Dynamic Guidance (DDG) strategy that dynamically adjusts both the perturbation budget and supervision signal. Specifically, DDG scales the perturbation magnitude according to the sample confidence at the ground truth class, thereby guiding samples toward consistent decision boundaries while mitigating the influence of learning spurious correlations. Simultaneously, it dynamically adjusts the supervision signal based on the prediction state of each sample, preventing overemphasis on incorrect signals. To alleviate potential gradient instability arising from dynamic guidance, we further design a weighted regularization constraint. Extensive experiments on standard benchmarks demonstrate that DDG effectively alleviates both CO and the robustness accuracy trade off.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DDG gives a practical per-sample fix for catastrophic overfitting in fast adversarial training by scaling perturbations and supervision on low-confidence examples.

read the letter

The main point is that this paper identifies low-confidence samples as the main drivers of both catastrophic overfitting and the clean-robust accuracy tradeoff in fast adversarial training, then proposes a targeted dynamic adjustment to handle them. They split training samples by confidence, vary perturbation budgets and supervision strength across groups, and show the low-confidence ones are where the problems concentrate. From that they build Distribution-aware Dynamic Guidance, which reduces the perturbation magnitude for samples with low ground-truth class confidence and modulates the supervision signal according to whether the current prediction matches the label. A simple weighted regularization term is added to keep gradients from blowing up under the dynamic changes. The experiments run on CIFAR-10, CIFAR-100, and SVHN with PGD and AutoAttack, and the ablations indicate each piece contributes to the reported gains without obvious contradictions. The approach is straightforward to implement on top of existing FAT pipelines. The analysis is mostly empirical, so there is no deep theoretical guarantee that the same confidence-based scaling will remain stable under very different attack strengths or model scales. The benchmarks are the usual ones, and it would be useful to see results on larger datasets or with additional attack types. Still, the group-wise experiments and component ablations are presented clearly enough that the central claims hold up on the reported evidence. This is the sort of incremental but usable improvement that people already running fast adversarial training will want to test. It has enough new mechanism and supporting data to go to peer review rather than a desk reject.

Referee Report

3 major / 2 minor

Summary. The manuscript analyzes Fast Adversarial Training (FAT) and identifies low-confidence samples as the primary contributors to catastrophic overfitting (CO) and the robustness-accuracy trade-off through group-wise modulation experiments. It proposes Distribution-aware Dynamic Guidance (DDG) that scales perturbation magnitude by ground-truth class confidence and adjusts the supervision signal according to each sample's prediction state, supplemented by a weighted regularization term to stabilize gradients. Experiments on CIFAR-10, CIFAR-100, and SVHN with PGD and AutoAttack attacks demonstrate that DDG mitigates both CO and the trade-off.

Significance. If the empirical results hold, this provides a practical, sample-adaptive method to stabilize FAT without additional computational overhead, advancing efficient robust training. The confidence-group analysis offers mechanistic insight into why uniform guidance fails in FAT. Strengths include the ablation tables, group-wise modulation results, and reproducible benchmark comparisons that directly support the proposed mechanism.

major comments (3)

[§4.2] §4.2, Eq. (4): the dynamic supervision adjustment is stated to depend on prediction state, but the manuscript provides no derivation showing how this term interacts with the cross-entropy loss to avoid reinforcing incorrect labels on low-confidence samples; a short proof sketch or gradient analysis would strengthen the claim.
[Table 2] Table 2 (CIFAR-100, AutoAttack column): the reported robustness gain of DDG over the FAT baseline is 3.2 percentage points, yet no standard deviation across seeds is given and the ablation isolating the regularization term shows only a 0.8 point drop when removed; this makes it hard to assess whether the full gain is robust or sensitive to hyper-parameters.
[§5.3] §5.3: the claim that low-confidence samples drive both CO and the trade-off rests on the group-wise modulation results, but the paper does not report what fraction of the training set falls into the lowest-confidence bin or whether the effect persists when the bin boundaries are varied by ±10%.

minor comments (2)

[§3] Notation: the symbol for ground-truth confidence is introduced in §3 without an explicit definition equation; adding one line would improve readability.
[Figure 3] Figure 3 caption: the x-axis label 'confidence group' should specify the exact bin edges used (e.g., [0-0.3], [0.3-0.6], etc.) so readers can reproduce the grouping.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the positive recommendation and constructive comments. We address each major point below and will incorporate the suggested clarifications and additions in the revised manuscript.

read point-by-point responses

Referee: [§4.2] §4.2, Eq. (4): the dynamic supervision adjustment is stated to depend on prediction state, but the manuscript provides no derivation showing how this term interacts with the cross-entropy loss to avoid reinforcing incorrect labels on low-confidence samples; a short proof sketch or gradient analysis would strengthen the claim.

Authors: We agree that a derivation would strengthen the mechanistic claim. In the revision we will add a concise gradient analysis immediately after Equation (4) in §4.2. The analysis shows that the state-dependent weight multiplies the standard cross-entropy gradient; when a low-confidence sample is misclassified, the weight is reduced, thereby attenuating the gradient component that would otherwise reinforce the incorrect label. We will include the relevant partial-derivative steps. revision: yes
Referee: [Table 2] Table 2 (CIFAR-100, AutoAttack column): the reported robustness gain of DDG over the FAT baseline is 3.2 percentage points, yet no standard deviation across seeds is given and the ablation isolating the regularization term shows only a 0.8 point drop when removed; this makes it hard to assess whether the full gain is robust or sensitive to hyper-parameters.

Authors: We will update Table 2 to report mean and standard deviation over three random seeds, confirming that the 3.2-point gain remains statistically significant. We will also expand the accompanying text to note that the modest 0.8-point ablation drop indicates the regularization term primarily stabilizes training rather than driving the main improvement, and we will reference the hyper-parameter sensitivity study already present in the appendix to address robustness concerns. revision: yes
Referee: [§5.3] §5.3: the claim that low-confidence samples drive both CO and the trade-off rests on the group-wise modulation results, but the paper does not report what fraction of the training set falls into the lowest-confidence bin or whether the effect persists when the bin boundaries are varied by ±10%.

Authors: We will augment §5.3 with the empirical distribution of samples across confidence bins (including the exact fraction in the lowest bin). We will also add a short sensitivity experiment that recomputes the group-wise modulation results after shifting all bin boundaries by ±10 % and will report that the qualitative conclusions regarding the role of low-confidence samples remain unchanged. revision: yes

Circularity Check

0 steps flagged

No significant objection identified

full rationale

The paper conducts an empirical analysis of guidance strength across confidence groups, identifies low-confidence samples as drivers of CO and the accuracy-robustness trade-off, and proposes DDG with per-sample dynamic scaling of perturbation (by ground-truth confidence) and supervision (by prediction state) plus weighted regularization. These components are formulated independently of the final benchmark metrics; no equations reduce reported gains to fitted parameters by construction, and the validation on CIFAR/SVHN with PGD/AutoAttack remains external to the definitions. The derivation chain is self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach relies on standard supervised learning assumptions and the existence of a meaningful confidence signal from the model; no new physical or mathematical entities are postulated.

axioms (1)

domain assumption Bounded perturbation norms and gradient-based attack generation as in standard adversarial training.
Implicit foundation of FAT and the proposed dynamic scaling.

pith-pipeline@v0.9.0 · 5532 in / 1141 out tokens · 44224 ms · 2026-05-08T04:11:24.988468+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references · 8 canonical work pages · 1 internal anchor

[1]

Understanding and im- proving fast adversarial training

Flammarion N Andriushchenko M. Understanding and im- proving fast adversarial training. InNIPS, pages 16048– 16059, 2020. 1, 2, 5, 6, 7, 8

2020
[2]

Advdo: Realistic adversarial attacks for trajectory prediction

Yulong Cao, Chaowei Xiao, Anima Anandkumar, Danfei Xu, and Marco Pavone. Advdo: Realistic adversarial attacks for trajectory prediction. InECCV, pages 36–52. Springer,
[3]

Towards evaluating the robustness of neural networks

Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. InS&P, pages 39–57, 2017. 5, 6, 7, 8

2017
[4]

Defending

Stephen Casper, Lennart Schulze, Oam Patel, and Dy- lan Hadfield-Menell. Defending against unforeseen fail- ure modes with latent adversarial training.arXiv preprint arXiv:2403.05030, 2024. 1

work page arXiv 2024
[5]

Fast gradient non-sign methods.arXiv preprint arXiv:2110.12734, 2021

Yaya Cheng, Jingkuan Song, Xiaosu Zhu, Qilong Zhang, Lianli Gao, and Heng Tao Shen. Fast gradient non-sign methods.arXiv preprint arXiv:2110.12734, 2021. 2

work page arXiv 2021
[6]

Reliable evalua- tion of adversarial robustness with an ensemble of diverse parameter-free attacks

Francesco Croce and Matthias Hein. Reliable evalua- tion of adversarial robustness with an ensemble of diverse parameter-free attacks. InICML, 2020. 5, 6, 7, 8

2020
[7]

Make some noise: Reliable and efficient single-step adver- sarial training.NIPS, 35:12881–12893, 2022

Pau de Jorge Aranda, Adel Bibi, Riccardo V olpi, Amartya Sanyal, Philip Torr, Gr ´egory Rogez, and Puneet Dokania. Make some noise: Reliable and efficient single-step adver- sarial training.NIPS, 35:12881–12893, 2022. 2, 5, 6, 7, 8

2022
[8]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. InCVPR, pages 248–255. Ieee, 2009. 5

2009
[9]

Boosting adversarial at- tacks with momentum

Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li. Boosting adversarial at- tacks with momentum. InCVPR, pages 9185–9193, 2018. 2

2018
[10]

Enhancing noise robustness of retrieval-augmented language models with adaptive adver- sarial training.arXiv preprint arXiv:2405.20978, 2024

Feiteng Fang, Yuelin Bai, Shiwen Ni, Min Yang, Xiao- jun Chen, and Ruifeng Xu. Enhancing noise robustness of retrieval-augmented language models with adaptive adver- sarial training.arXiv preprint arXiv:2405.20978, 2024. 1

work page arXiv 2024
[11]

Minimally distorted adversarial examples with a fast adaptive boundary attack

Croce Francesco and Hein Matthias. Minimally distorted adversarial examples with a fast adaptive boundary attack. InICML, pages 2196–2205, 2020. 5

2020
[12]

Golgooni, M

Z. Golgooni, M. Saberi, M. Eskandar, and M. H Rohban. Ze- rograd: Mitigating and explaining catastrophic overfitting in fgsm adversarial training.arXiv preprint arXiv:2103.15476,

work page arXiv
[13]

Explaining and harnessing adversarial examples

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. InICLR,
[14]

Segpgd: An effective and efficient adversarial attack for evaluating and boosting segmentation robustness

Jindong Gu, Hengshuang Zhao, V olker Tresp, and Philip HS Torr. Segpgd: An effective and efficient adversarial attack for evaluating and boosting segmentation robustness. InECCV, pages 308–325. Springer, 2022. 2

2022
[15]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InCVPR, pages 770–778, 2016. 5

2016
[16]

Fast adversarial training with adaptive step size.TIP, 32:6102–6114, 2023

Zhichao Huang, Yanbo Fan, Chen Liu, Weizhong Zhang, Yong Zhang, Mathieu Salzmann, Sabine S ¨usstrunk, and Jue Wang. Fast adversarial training with adaptive step size.TIP, 32:6102–6114, 2023. 1, 2, 5

2023
[17]

Las-at: adversarial training with learn- able attack strategy

Xiaojun Jia, Yong Zhang, Baoyuan Wu, Ke Ma, Jue Wang, and Xiaochun Cao. Las-at: adversarial training with learn- able attack strategy. InCVPR, pages 13398–13408, 2022. 1, 2, 5, 6, 7, 8

2022
[18]

Boosting fast adversarial training with learnable adversarial initialization.TIP, 31:4417–4430, 2022

Xiaojun Jia, Yong Zhang, Baoyuan Wu, Jue Wang, and Xi- aochun Cao. Boosting fast adversarial training with learnable adversarial initialization.TIP, 31:4417–4430, 2022. 2, 5, 6, 7, 8

2022
[19]

Revisiting and exploring efficient fast adversarial train- ing via law: Lipschitz regularization and auto weight aver- aging.TIFS, 2024

Xiaojun Jia, Yuefeng Chen, Xiaofeng Mao, Ranjie Duan, Jindong Gu, Rong Zhang, Hui Xue, Yang Liu, and Xiaochun Cao. Revisiting and exploring efficient fast adversarial train- ing via law: Lipschitz regularization and auto weight aver- aging.TIFS, 2024. 2

2024
[20]

Improving fast adversar- ial training with prior-guided knowledge.PAMI, 2024

Xiaojun Jia, Yong Zhang, Xingxing Wei, Baoyuan Wu, Ke Ma, Jue Wang, and Xiaochun Cao. Improving fast adversar- ial training with prior-guided knowledge.PAMI, 2024. 1, 2, 5, 6, 7

2024
[21]

Understanding catastrophic overfitting in single-step adversarial training

Hoki Kim, Woojin Lee, and Jaewook Lee. Understanding catastrophic overfitting in single-step adversarial training. In AAAI, pages 8119–8127, 2021. 1

2021
[22]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009. 5

2009
[23]

Kurakin, I.J

A. Kurakin, I.J. Goodfellow, and S. Bengio. Adversarial ma- chine learning at scale. InICLR, 2017. 2

2017
[24]

Ad- versarial examples in the physical world

Alexey Kurakin, Ian J Goodfellow, and Samy Bengio. Ad- versarial examples in the physical world. InArtificial in- telligence safety and security, pages 99–112. Chapman and Hall/CRC, 2018. 5, 6, 7, 8

2018
[25]

Subspace adversarial training

Tao Li, Yingwen Wu, Sizhe Chen, Kun Fang, and Xiaolin Huang. Subspace adversarial training. InCVPR, pages 13409–13418, 2022. 1

2022
[26]

Towards deep learning models resistant to adversarial attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. InICLR, 2018. 2, 5, 6, 7, 8

2018
[27]

Square attack: a query-efficient black-box adversarial attack via random search

Andriushchenko Maksym, Croce Francesco, Flammarion Nicolas, and Hein Matthias. Square attack: a query-efficient black-box adversarial attack via random search. InECCV,
[28]

When adversarial training meets vision trans- formers: Recipes from training to architecture.NIPS, 35: 18599–18611, 2022

Yichuan Mo, Dongxian Wu, Yifei Wang, Yiwen Guo, and Yisen Wang. When adversarial training meets vision trans- formers: Recipes from training to architecture.NIPS, 35: 18599–18611, 2022. 2

2022
[29]

Fast adversarial training with noise augmentation: A unified perspective on randstart and gradalign.arXiv preprint arXiv:2202.05488, 2022

Axi Niu, Kang Zhang, Chaoning Zhang, Chenshuang Zhang, In So Kweon, Chang D Yoo, and Yanning Zhang. Fast adversarial training with noise augmentation: A unified perspective on randstart and gradalign.arXiv preprint arXiv:2202.05488, 2022. 2

work page arXiv 2022
[30]

Adversarial initialization with universal adversarial perturbation: A new approach to fast adversarial training

Chao Pan, Qing Li, and Xin Yao. Adversarial initialization with universal adversarial perturbation: A new approach to fast adversarial training. InAAAI, pages 21501–21509, 2024. 1

2024
[31]

Reliably fast adver- sarial training via latent adversarial perturbation

Geon Yeong Park and Sang Wan Lee. Reliably fast adver- sarial training via latent adversarial perturbation. InICCV, pages 7758–7767, 2021. 1

2021
[32]

Overfitting in ad- versarially robust deep learning

Leslie Rice, Eric Wong, and Zico Kolter. Overfitting in ad- versarially robust deep learning. InICML, pages 8093–8104. PMLR, 2020. 2

2020
[33]

Adversarial training for free!NIPS, 32, 2019

Ali Shafahi, Mahyar Najibi, Mohammad Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S Davis, Gavin Taylor, and Tom Goldstein. Adversarial training for free!NIPS, 32, 2019. 2, 5, 6, 7, 8

2019
[34]

Baldro: A distributionally robust optimization based framework for large language model unlearning.arXiv preprint arXiv:2601.09172, 2026

Pengyang Shao, Naixin Zhai, Lei Chen, Yonghui Yang, Fengbin Zhu, Xun Yang, and Meng Wang. Baldro: A distributionally robust optimization based framework for large language model unlearning.arXiv preprint arXiv:2601.09172, 2026. 1

work page arXiv 2026
[35]

Guided adversarial attack for evaluating and enhancing adversarial defenses.NIPS, 33:20297–20308, 2020

Gaurang Sriramanan, Sravanti Addepalli, Arya Baburaj, et al. Guided adversarial attack for evaluating and enhancing adversarial defenses.NIPS, 33:20297–20308, 2020. 5, 6, 7, 8

2020
[36]

Towards efficient and effective adversarial training

Gaurang Sriramanan, Sravanti Addepalli, Arya Baburaj, et al. Towards efficient and effective adversarial training. InNIPS, pages 11821–11833, 2021. 2, 5

2021
[37]

Effective single-step adver- sarial training with energy-based models.IEEE Transactions on Emerging Topics in Computational Intelligence, 2024

Keke Tang, Tianrui Lou, Weilong Peng, Nenglun Chen, Yawen Shi, and Wenping Wang. Effective single-step adver- sarial training with energy-based models.IEEE Transactions on Emerging Topics in Computational Intelligence, 2024. 1

2024
[38]

Taxonomy driven fast adversarial training

Kun Tong, Chengze Jiang, Jie Gui, and Yuan Cao. Taxonomy driven fast adversarial training. InAAAI, pages 5233–5242,
[39]

Improving adversarial robustness requires revisiting misclassified examples

Yisen Wang, Difan Zou, Jinfeng Yi, James Bailey, Xingjun Ma, and Quanquan Gu. Improving adversarial robustness requires revisiting misclassified examples. InICLR, 2019. 1, 6, 7, 8

2019
[40]

Re- visiting adversarial training at scale

Zeyu Wang, Xianhang Li, Hongru Zhu, and Cihang Xie. Re- visiting adversarial training at scale. InCVPR, pages 24675– 24685, 2024. 2

2024
[41]

Wong E, Rice L

Kolter J Z. Wong E, Rice L. Fast is better than free: Re- visiting adversarial training. InICLR, 2020. 1, 2, 5, 6, 7, 8

2020
[42]

Prior-guided adversarial initialization for fast adversarial training

Jia Xiaojun, Zhang Yong, Wei Xingxing, Wu Baoyuan, Ma Ke, Wang Jue, and Cao Xiaochun. Prior-guided adversarial initialization for fast adversarial training. InECCV, 2022. 5, 6, 7, 8

2022
[43]

Structure-guided adversarial training of diffusion models

Ling Yang, Haotian Qian, Zhilong Zhang, Jingwei Liu, and Bin Cui. Structure-guided adversarial training of diffusion models. InCVPR, pages 7256–7266, 2024. 1

2024
[44]

Fast adversarial training against textual adversarial attacks.arXiv preprint arXiv:2401.12461, 2024

Yichen Yang, Xin Liu, and Kun He. Fast adversarial training against textual adversarial attacks.arXiv preprint arXiv:2401.12461, 2024. 1

work page arXiv 2024
[45]

Robust and transferable backdoor attacks against deep image compres- sion with selective frequency prior.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

Yi Yu, Yufei Wang, Wenhan Yang, Lanqing Guo, Shijian Lu, Ling-Yu Duan, Yap-Peng Tan, and Alex C Kot. Robust and transferable backdoor attacks against deep image compres- sion with selective frequency prior.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. 1

2025
[46]

Towards model resistant to transferable adversarial examples via trigger ac- tivation.IEEE Transactions on Information Forensics and Security, 2025

Yi Yu, Song Xia, Xun Lin, Chenqi Kong, Wenhan Yang, Shi- jian Lu, Yap-Peng Tan, and Alex C Kot. Towards model resistant to transferable adversarial examples via trigger ac- tivation.IEEE Transactions on Information Forensics and Security, 2025. 2

2025
[47]

Backdoor attacks against no- reference image quality assessment models via a scalable trigger

Yi Yu, Song Xia, Xun Lin, Wenhan Yang, Shijian Lu, Yap- Peng Tan, and Alex Kot. Backdoor attacks against no- reference image quality assessment models via a scalable trigger. InProceedings of the AAAI Conference on Artificial Intelligence, 2025. 1

2025
[48]

Mtl-ue: Learning to learn nothing for multi-task learning

Yi Yu, Song Xia, SIYUAN Y ANG, Chenqi Kong, Wenhan Yang, Shijian Lu, Yap-Peng Tan, and Alex Kot. Mtl-ue: Learning to learn nothing for multi-task learning. InInter- national Conference on Machine Learning, 2025. 1

2025
[49]

Revisiting adversarial training under long-tailed distribu- tions

Xinli Yue, Ningping Mou, Qian Wang, and Lingchen Zhao. Revisiting adversarial training under long-tailed distribu- tions. InCVPR, pages 24492–24501, 2024. 1

2024
[50]

Maximizing Local Entropy Where It Matters: Prefix-Aware Localized LLM Unlearning

Naixin Zhai, Pengyang Shao, Binbin Zheng, Yonghui Yang, Fei Shen, Long Bai, and Xun Yang. Maximizing local en- tropy where it matters: Prefix-aware localized llm unlearn- ing.arXiv preprint arXiv:2601.03190, 2026. 1

work page internal anchor Pith review Pith/arXiv arXiv 2026
[51]

Theoretically principled trade-off between robustness and accuracy

Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Lau- rent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. InICML, pages 7472–7482. PMLR, 2019. 2

2019
[52]

Revisiting and advancing fast adversarial training through the lens of bi-level optimiza- tion

Yihua Zhang, Guanhua Zhang, Prashant Khanduri, Mingyi Hong, Shiyu Chang, and Sijia Liu. Revisiting and advancing fast adversarial training through the lens of bi-level optimiza- tion. InICML, pages 26693–26712. PMLR, 2022. 5

2022
[53]

Defensive unlearning with adversarial training for robust concept erasure in diffusion models.NIPS, 37:36748– 36776, 2024

Yimeng Zhang, Xin Chen, Jinghan Jia, Yihua Zhang, Chongyu Fan, Jiancheng Liu, Mingyi Hong, Ke Ding, and Sijia Liu. Defensive unlearning with adversarial training for robust concept erasure in diffusion models.NIPS, 37:36748– 36776, 2024. 1

2024
[54]

Fast adversarial training with smooth convergence

Mengnan Zhao, Lihe Zhang, Yuqiu Kong, and Baocai Yin. Fast adversarial training with smooth convergence. InICCV, pages 4720–4729, 2023. 2

2023
[55]

Catastrophic overfitting: A potential blessing in disguise

Mengnan Zhao, Lihe Zhang, Yuqiu Kong, and Baocai Yin. Catastrophic overfitting: A potential blessing in disguise. In ECCV, pages 293–310. Springer, 2024. 1, 2

2024
[56]

Shadows can be dangerous: Stealthy and effective physical-world adversarial attack by natural phe- nomenon

Yiqi Zhong, Xianming Liu, Deming Zhai, Junjun Jiang, and Xiangyang Ji. Shadows can be dangerous: Stealthy and effective physical-world adversarial attack by natural phe- nomenon. InCVPR, pages 15345–15354, 2022. 2

2022