arxiv: 2604.24350 · v1 · submitted 2026-04-27 · 💻 cs.LG · cs.AI· cs.CR

Recognition: unknown

Unveiling the Backdoor Mechanism Hidden Behind Catastrophic Overfitting in Fast Adversarial Training

Mengnan Zhao , Lihe Zhang , Tianhang Zheng , Bo Wang , Baocai Yin

Authors on Pith no claims yet

Pith reviewed 2026-05-08 04:25 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CR

keywords catastrophic overfittingfast adversarial trainingbackdoor attacksunlearnable tasksadversarial robustnessneural network trainingmodel interpretation

0 comments

The pith

Catastrophic overfitting in fast adversarial training functions as a weak backdoor trigger variant of unlearnable tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks a systematic explanation for why fast adversarial training produces models that overfit catastrophically to one attack and lose robustness to others. It does so by examining training dynamics and finding signatures that match backdoor behavior: divided pathways in the network, inconsistent feature predictions across classes, and triggers that distinguish classes universally. A reader would care because this framing turns an opaque failure mode into a recognizable mechanism, opening the door to proven countermeasures from backdoor research instead of ad-hoc fixes.

Core claim

We interpret catastrophic overfitting in fast adversarial training as a backdoor phenomenon, supported by evidence of pathway division, diverse feature predictions, and universal class-distinguishable triggers. This leads us to conceptualize CO as a weak trigger variant of unlearnable tasks, placing CO, backdoor attacks, and unlearnable tasks inside one theoretical framework. The same view directly motivates backdoor-style interventions: recalibrating parameters through fine-tuning, linear probing, or reinitialization, plus a weight-outlier suppression term to curb abnormal weight growth.

What carries the argument

The backdoor lens on CO, which treats the overfitting as a weak, class-distinguishable trigger that unifies it with unlearnable tasks and enables direct transfer of mitigation tactics.

If this is right

Backdoor-inspired recalibration of parameters restores generalization to unseen attacks.
A weight-outlier suppression constraint limits the abnormal weight deviations that accompany CO.
The shared framework predicts that techniques successful against backdoors will also reduce CO.
Unlearnable-task methods become applicable to diagnosing and preventing catastrophic overfitting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Standard backdoor detection tools could be repurposed to flag CO during training before it becomes catastrophic.
Joint study of CO and unlearnable examples may reveal shared dynamics that govern when data becomes unusable for robust learning.
The unification suggests that robustness benchmarks should test models against both adversarial and backdoor-style triggers.

Load-bearing premise

The listed phenomena (pathway division, diverse feature predictions, and universal class-distinguishable triggers) demonstrate a backdoor mechanism rather than ordinary overfitting or memorization.

What would settle it

A controlled run in which a model exhibits clear catastrophic overfitting yet shows none of the three backdoor indicators (pathway division, diverse predictions, or universal triggers) would refute the proposed mechanism.

Figures

Figures reproduced from arXiv: 2604.24350 by Baocai Yin, Bo Wang, Lihe Zhang, Mengnan Zhao, Tianhang Zheng.

**Figure 1.** Figure 1: Distance matrix and UMAP visualizations under view at source ↗

**Figure 2.** Figure 2: Distance matrix and UMAP visualizations under view at source ↗

**Figure 3.** Figure 3: Training dynamics of existing FAT methods. view at source ↗

**Figure 4.** Figure 4: Backdoor fine-tuning techniques. FT: finetuning. view at source ↗

**Figure 5.** Figure 5: Weight distribution relative to the mean weight. ‘Count’ means the distribution percentage. view at source ↗

**Figure 6.** Figure 6: Ablation studies comparing our Lreg against simpler alternatives view at source ↗

**Figure 7.** Figure 7: Comparison between FGSM-based AT and PGD-based AT. whereas adversarial training improves accuracy to 86.89%. We investigate the transferability of the CO mitigation strategy to unlearnable tasks, with results summarized in Table VII. When the poisoning budget in unlearnable tasks is small, the Lreg for mitigating CO can also resolve the class-wise unlearnable attack, achieving performance comparable to the… view at source ↗

**Figure 8.** Figure 8: Ablation studies on the hyperparameter β. TABLE VII: Transferability of the CO mitigation strategy (Lreg in Eq. (12) ) to unlearnable tasks [71]. ‘Poisoned’, ‘AT’, and ‘Lreg’ correspond to models trained on the CIFAR10 poisoned dataset using ResNet-18 with standard training, adversarial training, and standard training augmented with Lreg, respectively. Each model is trained for 60 epochs using a cyclic le… view at source ↗

read the original abstract

Fast Adversarial Training (FAT) has attracted significant attention due to its efficiency in enhancing neural network robustness against adversarial attacks. However, FAT is prone to catastrophic overfitting (CO), wherein models overfit to the specific attack used during training and fail to generalize to others. While existing methods introduce diverse hypotheses and propose various strategies to mitigate CO, a systematic and intuitive explanation of CO remains absent. In this work, we innovatively interpret CO through the lens of backdoor. Through validations on pathway division, diverse feature predictions, and universal class distinguishable triggers in CO, we conceptualize CO as a weak trigger variant of unlearnable tasks, unifying CO, backdoor attacks, and unlearnable tasks under a common theoretical framework. Guided by this, we leverage several backdoor inspired strategies to mitigate CO: (i) Recalibrate CO affected model parameters using vanilla fine tuning, linear probing, or reinitialization-based techniques; (ii) Introduce a weight outlier suppression constraint to regulate abnormal deviations in model weights. Extensive experiments support our interpretation of CO and show the efficacy of the proposed mitigation strategies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reframes catastrophic overfitting as a weak-trigger backdoor but the link rests on observations that standard overfitting explanations can also cover.

read the letter

The paper's main move is to treat catastrophic overfitting in fast adversarial training as a weak-trigger variant of unlearnable tasks. This lets them pull CO, backdoor attacks, and unlearnable examples under one framework. They support the view by pointing to pathway division, diverse feature predictions, and universal class-distinguishable triggers, then apply backdoor-style fixes such as parameter recalibration via fine-tuning or reinitialization and a weight outlier suppression constraint. The abstract states that experiments back both the interpretation and the mitigations. The unification is the clearest new element, and it supplies a single conceptual handle where earlier work gave separate hypotheses and ad-hoc patches. The fixes follow directly from the framing, which keeps the practical side tidy. The soft spots sit in the validation. No numbers, error bars, or ablation tables appear in the abstract, so the strength of the support and the gains over prior CO mitigations stay unclear. The listed phenomena are also compatible with ordinary memorization of adversarial directions without any trigger-like or poisoning structure. The risk of circularity is real if the trigger tests are not shown to be independent of the overfitting they aim to explain. This work is aimed at researchers who build robust models with fast training and want an organizing principle for why CO occurs. A reader focused on conceptual reframing in adversarial robustness could extract value from the framework, but only once the full quantitative controls and comparisons are checked. It deserves a serious referee because the idea is coherent enough to test and the mitigation angle is worth evaluating, even if the evidence will need tightening.

Referee Report

3 major / 2 minor

Summary. The paper claims that catastrophic overfitting (CO) in fast adversarial training (FAT) is a weak-trigger variant of unlearnable tasks, thereby unifying CO, backdoor attacks, and unlearnable tasks under a single theoretical framework. It validates this view via observations of pathway division, diverse feature predictions, and universal class-distinguishable triggers, then proposes backdoor-inspired mitigations consisting of parameter recalibration (vanilla fine-tuning, linear probing, or reinitialization) and a weight-outlier suppression constraint. The abstract states that experiments support both the interpretation and the efficacy of the fixes.

Significance. If the backdoor interpretation holds and the mitigations prove robust, the work would supply a novel unifying lens on CO that could import techniques from the backdoor literature into adversarial training, potentially yielding more reliable and efficient robustness methods. The practical recalibration and regularization strategies are directly usable and could improve FAT in settings where speed is critical.

major comments (3)

[validation experiments] The validations on pathway division, diverse feature predictions, and universal class-distinguishable triggers (described in the abstract and the validation section) do not include control experiments that would distinguish a backdoor (weak-trigger unlearnable-task) mechanism from standard explanations of CO such as memorization of attack-specific perturbation directions. Without such falsifying tests, the central interpretive claim remains compatible with non-backdoor accounts of overfitting.
[theoretical framework / conceptualization] No formal definition, mathematical axioms, or precise characterization of the claimed 'common theoretical framework' is supplied. The conceptualization of CO as a 'weak trigger variant of unlearnable tasks' is introduced informally, which prevents rigorous verification of the unification and makes the framework non-load-bearing for the paper's conclusions.
[experiments / abstract] The abstract asserts that 'extensive experiments support our interpretation' yet reports no quantitative metrics, error bars, ablation details, or comparisons against existing CO mitigation baselines. This absence undermines the ability to assess whether the proposed mitigations outperform prior methods or merely reproduce known regularization effects.

minor comments (2)

[mitigation strategies] Notation for the weight-outlier suppression constraint should be defined explicitly (e.g., the precise form of the regularizer and its hyper-parameters) to allow reproduction.
[conclusion] The manuscript would benefit from a dedicated limitations paragraph discussing the scope of the backdoor analogy (e.g., whether it applies only to specific attack norms or architectures).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and insightful comments. We address each major point below, providing our responses and indicating revisions to the manuscript.

read point-by-point responses

Referee: [validation experiments] The validations on pathway division, diverse feature predictions, and universal class-distinguishable triggers (described in the abstract and the validation section) do not include control experiments that would distinguish a backdoor (weak-trigger unlearnable-task) mechanism from standard explanations of CO such as memorization of attack-specific perturbation directions. Without such falsifying tests, the central interpretive claim remains compatible with non-backdoor accounts of overfitting.

Authors: We appreciate this suggestion for strengthening the interpretive claim. Our existing validations demonstrate phenomena such as pathway division and universal class-distinguishable triggers that are consistent with a weak-trigger backdoor mechanism. We agree that dedicated control experiments are needed to differentiate from alternatives like memorization of perturbation directions. In the revision, we will add control studies, including training with non-adversarial or random perturbations and testing for trigger universality in non-CO settings, to provide falsifying evidence. revision: yes
Referee: [theoretical framework / conceptualization] No formal definition, mathematical axioms, or precise characterization of the claimed 'common theoretical framework' is supplied. The conceptualization of CO as a 'weak trigger variant of unlearnable tasks' is introduced informally, which prevents rigorous verification of the unification and makes the framework non-load-bearing for the paper's conclusions.

Authors: The unification is presented as a conceptual framework highlighting mechanistic parallels, such as weak triggers inducing overfitting to specific patterns across CO, backdoors, and unlearnable tasks. We acknowledge the benefit of greater precision. We will add a subsection with a more formal characterization, defining shared properties (e.g., trigger weakness leading to class-specific overfitting) to support verification while retaining the intuitive unification. revision: yes
Referee: [experiments / abstract] The abstract asserts that 'extensive experiments support our interpretation' yet reports no quantitative metrics, error bars, ablation details, or comparisons against existing CO mitigation baselines. This absence undermines the ability to assess whether the proposed mitigations outperform prior methods or merely reproduce known regularization effects.

Authors: The abstract is intentionally concise, while the body of the manuscript (experimental sections) reports quantitative metrics, error bars from multiple runs, ablation studies on recalibration techniques and outlier suppression, and direct comparisons to prior CO mitigation methods. To improve transparency, we will revise the abstract to briefly reference these key results and performance advantages. revision: yes

Circularity Check

0 steps flagged

No significant circularity in interpretive unification of CO with backdoor/unlearnable tasks

full rationale

The paper's core move is an empirical interpretation: it reports observations of pathway division, diverse feature predictions, and class-distinguishable triggers within CO, then proposes to view CO as a weak-trigger variant of unlearnable tasks. This is presented as a unifying lens rather than a mathematical derivation, first-principles proof, or fitted model whose outputs are renamed as predictions. No equations appear that reduce to their own inputs by construction, no parameters are fitted on a subset and then called predictions on related quantities, and no load-bearing self-citations or uniqueness theorems are invoked. The mitigation strategies (fine-tuning, weight suppression) are motivated by the interpretation but remain independent empirical tests. The claimed common theoretical framework is therefore self-contained as an organizing perspective on existing phenomena, not a closed loop that forces the result from the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Only the abstract is available, so the ledger is necessarily incomplete. The central claim introduces one new conceptual entity (the weak-trigger variant) and relies on standard neural-network training assumptions plus the unstated premise that the listed validation phenomena are diagnostic of backdoors.

axioms (1)

domain assumption Neural networks trained with fast adversarial training can develop class-specific feature pathways that are separable from normal decision boundaries.
Invoked when the authors validate via pathway division.

invented entities (1)

weak trigger variant of unlearnable tasks no independent evidence
purpose: To reinterpret catastrophic overfitting as a backdoor-like phenomenon
The entity is postulated to unify three previously separate concepts; no independent falsifiable prediction (e.g., a specific trigger strength threshold) is stated in the abstract.

pith-pipeline@v0.9.0 · 5507 in / 1419 out tokens · 43171 ms · 2026-05-08T04:25:44.495095+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

106 extracted references · 17 canonical work pages · 2 internal anchors

[1]

Deep learning in optical metrology: a review,

C. Zuo, J. Qian, S. Feng, W. Yin, Y . Li, P. Fan, J. Han, K. Qian, and Q. Chen, “Deep learning in optical metrology: a review,”Light: Science & Applications, vol. 11, no. 1, p. 39, 2022

2022
[2]

Deep-learning seismology,

S. M. Mousavi and G. C. Beroza, “Deep-learning seismology,”Science, vol. 377, no. 6607, p. eabm4470, 2022

2022
[3]

Sleap: A deep learning system for multi-animal pose tracking,

T. D. Pereira, N. Tabris, A. Matsliah, D. M. Turner, J. Li, S. Ravin- dranath, E. S. Papadoyannis, E. Normand, D. S. Deutsch, Z. Y . Wang et al., “Sleap: A deep learning system for multi-animal pose tracking,” Nature methods, vol. 19, no. 4, pp. 486–495, 2022

2022
[4]

Deep learning and protein structure modeling,

M. Baek and D. Baker, “Deep learning and protein structure modeling,” Nature methods, vol. 19, no. 1, pp. 13–14, 2022

2022
[5]

Current progress and open challenges for applying deep learning across the biosciences,

N. Sapoval, A. Aghazadeh, M. G. Nute, D. A. Antunes, A. Balaji, R. Baraniuk, C. Barberan, R. Dannenfelser, C. Dun, M. Edrisiet al., “Current progress and open challenges for applying deep learning across the biosciences,”Nature Communications, vol. 13, no. 1, p. 1728, 2022

2022
[6]

How to backdoor diffusion models?

S.-Y . Chou, P.-Y . Chen, and T.-Y . Ho, “How to backdoor diffusion models?” inCVPR, 2023, pp. 4015–4024

2023
[7]

Noise-suppressing neural dynamics for time-dependent constrained nonlinear optimization with applications,

L. Wei, L. Jin, and X. Luo, “Noise-suppressing neural dynamics for time-dependent constrained nonlinear optimization with applications,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 10, pp. 6139–6150, 2022

2022
[8]

Effective backdoor defense by exploiting sensitivity of poisoned samples,

W. Chen, B. Wu, and H. Wang, “Effective backdoor defense by exploiting sensitivity of poisoned samples,” vol. 35, 2022, pp. 9727– 9737

2022
[9]

Advdo: Re- alistic adversarial attacks for trajectory prediction,

Y . Cao, C. Xiao, A. Anandkumar, D. Xu, and M. Pavone, “Advdo: Re- alistic adversarial attacks for trajectory prediction,” inECCV. Springer, 2022, pp. 36–52

2022
[10]

Segpgd: An effective and efficient adversarial attack for evaluating and boosting segmentation robustness,

J. Gu, H. Zhao, V . Tresp, and P. H. Torr, “Segpgd: An effective and efficient adversarial attack for evaluating and boosting segmentation robustness,” inECCV. Springer, 2022, pp. 308–325

2022
[11]

Shadows can be dangerous: Stealthy and effective physical-world adversarial attack by natural phenomenon,

Y . Zhong, X. Liu, D. Zhai, J. Jiang, and X. Ji, “Shadows can be dangerous: Stealthy and effective physical-world adversarial attack by natural phenomenon,” inCVPR, 2022, pp. 15 345–15 354

2022
[12]

Structure-guided adversarial training of diffusion models,

L. Yang, H. Qian, Z. Zhang, J. Liu, and B. Cui, “Structure-guided adversarial training of diffusion models,” inCVPR, 2024, pp. 7256– 7266

2024
[13]

Defending

S. Casper, L. Schulze, O. Patel, and D. Hadfield-Menell, “Defending against unforeseen failure modes with latent adversarial training,”arXiv preprint arXiv:2403.05030, 2024

work page arXiv 2024
[14]

Defensive unlearning with adversarial training for robust concept erasure in diffusion models,

Y . Zhang, X. Chen, J. Jia, Y . Zhang, C. Fan, J. Liu, M. Hong, K. Ding, and S. Liu, “Defensive unlearning with adversarial training for robust concept erasure in diffusion models,” vol. 37, 2024, pp. 36 748–36 776

2024
[15]

Revisiting adversarial training under long-tailed distributions,

X. Yue, N. Mou, Q. Wang, and L. Zhao, “Revisiting adversarial training under long-tailed distributions,” inCVPR, 2024, pp. 24 492–24 501

2024
[16]

Enhancing noise robustness of retrieval-augmented language models with adaptive adver- sarial training.arXiv preprint arXiv:2405.20978, 2024

F. Fang, Y . Bai, S. Ni, M. Yang, X. Chen, and R. Xu, “Enhancing noise robustness of retrieval-augmented language models with adaptive adversarial training,”arXiv preprint arXiv:2405.20978, 2024

work page arXiv 2024
[17]

Effec- tive single-step adversarial training with energy-based models,

K. Tang, T. Lou, W. Peng, N. Chen, Y . Shi, and W. Wang, “Effec- tive single-step adversarial training with energy-based models,”IEEE Transactions on Emerging Topics in Computational Intelligence, 2024. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY , VOL. XX, NO. X, MONTH 20XX 11

2024
[18]

When adversarial train- ing meets vision transformers: Recipes from training to architecture,

Y . Mo, D. Wu, Y . Wang, Y . Guo, and Y . Wang, “When adversarial train- ing meets vision transformers: Recipes from training to architecture,” vol. 35, 2022, pp. 18 599–18 611

2022
[19]

Las-at: adversarial training with learnable attack strategy,

X. Jia, Y . Zhang, B. Wu, K. Ma, J. Wang, and X. Cao, “Las-at: adversarial training with learnable attack strategy,” inCVPR, 2022, pp. 13 398–13 408

2022
[20]

Towards efficient adversarial training on vision transformers,

B. Wu, J. Gu, Z. Li, D. Cai, X. He, and W. Liu, “Towards efficient adversarial training on vision transformers,” inECCV. Springer, 2022, pp. 307–325

2022
[21]

Stability analysis and generalization bounds of adversarial training,

J. Xiao, Y . Fan, R. Sun, J. Wang, and Z.-Q. Luo, “Stability analysis and generalization bounds of adversarial training,” vol. 35, 2022, pp. 15 446–15 459

2022
[22]

Enhancing adversarial training with second-order statistics of weights,

G. Jin, X. Yi, W. Huang, S. Schewe, and X. Huang, “Enhancing adversarial training with second-order statistics of weights,” inCVPR, 2022, pp. 15 273–15 283

2022
[23]

Cross-lingual event detection via optimized adversarial training,

L. Guzman-Nateras, M. Van Nguyen, and T. Nguyen, “Cross-lingual event detection via optimized adversarial training,” inConference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022, pp. 5588–5599

2022
[24]

Towards deep learning models resistant to adversarial attacks,

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” inICLR, 2018

2018
[25]

Adversarial initialization with universal adversarial perturbation: A new approach to fast adversarial training,

C. Pan, Q. Li, and X. Yao, “Adversarial initialization with universal adversarial perturbation: A new approach to fast adversarial training,” inAAAI, vol. 38, no. 19, 2024, pp. 21 501–21 509

2024
[26]

Understanding catastrophic overfitting in single-step adversarial training,

H. Kim, W. Lee, and J. Lee, “Understanding catastrophic overfitting in single-step adversarial training,” inAAAI, vol. 35, no. 9, 2021, pp. 8119–8127

2021
[27]

Fast is better than free: Revisiting adversarial training,

K. J. Z. Wong E, Rice L, “Fast is better than free: Revisiting adversarial training,” inICLR, 2020

2020
[28]

Subspace adversarial training,

T. Li, Y . Wu, S. Chen, K. Fang, and X. Huang, “Subspace adversarial training,” inCVPR, 2022, pp. 13 409–13 418

2022
[29]

Fast adversarial training with adaptive step size,

Z. Huang, Y . Fan, C. Liu, W. Zhang, Y . Zhang, M. Salzmann, S. S ¨usstrunk, and J. Wang, “Fast adversarial training with adaptive step size,”IEEE TIP, vol. 32, pp. 6102–6114, 2023

2023
[30]

Reliably fast adversarial training via latent adversarial perturbation,

G. Y . Park and S. W. Lee, “Reliably fast adversarial training via latent adversarial perturbation,” inICCV, 2021, pp. 7758–7767

2021
[31]

Fast adversarial training against textual adversarial attacks.arXiv preprint arXiv:2401.12461, 2024

Y . Yang, X. Liu, and K. He, “Fast adversarial training against textual adversarial attacks,”arXiv preprint arXiv:2401.12461, 2024

work page arXiv 2024
[32]

Fast adversarial training with adaptive step size,

Z. Huang, Y . Fan, C. Liu, W. Zhang, Y . Zhang, M. Salzmann, S. S ¨usstrunk, and J. Wang, “Fast adversarial training with adaptive step size,”arXiv preprint arXiv:2206.02417, 2022

work page arXiv 2022
[33]

Fast adversarial training with smooth convergence,

M. Zhao, L. Zhang, Y . Kong, and B. Yin, “Fast adversarial training with smooth convergence,” inICCV, 2023, pp. 4720–4729

2023
[34]

Zerograd: Costless conscious remedies for catastrophic overfitting in the fgsm adversarial training,

Z. Golgooni, M. Saberi, M. Eskandar, and M. H. Rohban, “Zerograd: Costless conscious remedies for catastrophic overfitting in the fgsm adversarial training,”Intelligent Systems with Applications, vol. 19, p. 200258, 2023

2023
[35]

Understanding and improving fast adver- sarial training,

F. N. Andriushchenko M, “Understanding and improving fast adver- sarial training,” 2020, pp. 16 048–16 059

2020
[36]

Prior-guided adversarial initialization for fast adversarial training,

J. Xiaojun, Z. Yong, W. Xingxing, W. Baoyuan, M. Ke, W. Jue, and C. Xiaochun, “Prior-guided adversarial initialization for fast adversarial training,” inECCV, 2022

2022
[37]

Investigating catastrophic overfitting in fast adversarial training: A self-fitting perspective,

Z. He, T. Li, S. Chen, and X. Huang, “Investigating catastrophic overfitting in fast adversarial training: A self-fitting perspective,” in CVPR, 2023, pp. 2313–2320

2023
[38]

Understanding and improv- ing fast adversarial training,

M. Andriushchenko and N. Flammarion, “Understanding and improv- ing fast adversarial training,” vol. 33, 2020, pp. 16 048–16 059

2020
[39]

Catastrophic overfitting: A potential blessing in disguise,

M. Zhao, L. Zhang, Y . Kong, and B. Yin, “Catastrophic overfitting: A potential blessing in disguise,” inECCV. Springer, 2024, pp. 293–310

2024
[40]

Backdoor attacks and defenses targeting multi-domain ai models: A comprehensive review,

S. Zhang, Y . Pan, Q. Liu, Z. Yan, K.-K. R. Choo, and G. Wang, “Backdoor attacks and defenses targeting multi-domain ai models: A comprehensive review,”ACM Computing Surveys, vol. 57, no. 4, pp. 1–35, 2024

2024
[41]

Watch out for your agents! investigating backdoor threats to llm-based agents,

W. Yang, X. Bi, Y . Lin, S. Chen, J. Zhou, and X. Sun, “Watch out for your agents! investigating backdoor threats to llm-based agents,” vol. 37, 2024, pp. 100 938–100 964

2024
[42]

Badclip: Dual-embedding guided backdoor attack on multimodal contrastive learning,

S. Liang, M. Zhu, A. Liu, B. Wu, X. Cao, and E.-C. Chang, “Badclip: Dual-embedding guided backdoor attack on multimodal contrastive learning,” inCVPR, 2024, pp. 24 645–24 654

2024
[43]

Physical backdoor: Towards temperature-based backdoor attacks in the physical world,

W. Yin, J. Lou, P. Zhou, Y . Xie, D. Feng, Y . Sun, T. Zhang, and L. Sun, “Physical backdoor: Towards temperature-based backdoor attacks in the physical world,” inCVPR, 2024, pp. 12 733–12 743

2024
[44]

Adversarial machine learning at scale,

A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial machine learning at scale,” inICLR, 2017

2017
[45]

Boosting adversarial attacks with momentum,

Y . Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li, “Boosting adversarial attacks with momentum,” inCVPR, 2018, pp. 9185–9193

2018
[46]

Defense against adversarial attacks using topology aligning adversarial training,

H. Kuang, H. Liu, X. Lin, and R. Ji, “Defense against adversarial attacks using topology aligning adversarial training,”IEEE TIFS, vol. 19, pp. 3659–3673, 2024

2024
[47]

Revisiting adversarial training at scale,

Z. Wang, X. Li, H. Zhu, and C. Xie, “Revisiting adversarial training at scale,” inCVPR, 2024, pp. 24 675–24 685

2024
[48]

Explaining and harnessing adversarial examples,

I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” inICLR, 2015

2015
[49]

Revisiting and advancing fast adversarial training through the lens of bi-level optimization,

Y . Zhang, G. Zhang, P. Khanduri, M. Hong, S. Chang, and S. Liu, “Revisiting and advancing fast adversarial training through the lens of bi-level optimization,” inICML. PMLR, 2022, pp. 26 693–26 712

2022
[50]

Boosting fast adversarial training with learnable adversarial initialization,

X. Jia, Y . Zhang, B. Wu, J. Wang, and X. Cao, “Boosting fast adversarial training with learnable adversarial initialization,”IEEE TIP, vol. 31, pp. 4417–4430, 2022

2022
[51]

Prior- guided adversarial initialization for fast adversarial training,

X. Jia, Y . Zhang, X. Wei, B. Wu, K. Ma, J. Wang, and X. Cao, “Prior- guided adversarial initialization for fast adversarial training,” inECCV. Springer, 2022, pp. 567–584

2022
[52]

Taxonomy driven fast adversarial training,

K. Tong, C. Jiang, J. Gui, and Y . Cao, “Taxonomy driven fast adversarial training,” inAAAI, vol. 38, no. 6, 2024, pp. 5233–5242

2024
[53]

Revisiting and exploring efficient fast adversarial training via law: Lipschitz regularization and auto weight averaging,

X. Jia, Y . Chen, X. Mao, R. Duan, J. Gu, R. Zhang, H. Xue, Y . Liu, and X. Cao, “Revisiting and exploring efficient fast adversarial training via law: Lipschitz regularization and auto weight averaging,”IEEE TIFS, 2024

2024
[54]

Fast propagation is better: Ac- celerating single-step adversarial training via sampling subnetworks,

X. Jia, J. Li, J. Gu, Y . Bai, and X. Cao, “Fast propagation is better: Ac- celerating single-step adversarial training via sampling subnetworks,” IEEE TIFS, 2024

2024
[55]

Make some noise: Reliable and efficient single-step adversarial training,

P. de Jorge Aranda, A. Bibi, R. V olpi, A. Sanyal, P. Torr, G. Rogez, and P. Dokania, “Make some noise: Reliable and efficient single-step adversarial training,” vol. 35, 2022, pp. 12 881–12 893

2022
[56]

Overfitting in adversarially robust deep learning,

L. Rice, E. Wong, and Z. Kolter, “Overfitting in adversarially robust deep learning,” inICML. PMLR, 2020, pp. 8093–8104

2020
[57]

Improving fast adversarial training with prior-guided knowledge,

X. Jia, Y . Zhang, X. Wei, B. Wu, K. Ma, J. Wang, and X. Cao, “Improving fast adversarial training with prior-guided knowledge,” TPAMI, vol. 46, no. 9, pp. 6367–6383, 2024

2024
[58]

Rethinking fast adversarial train- ing: A splitting technique to overcome catastrophic overfitting,

M. Zareapoor and P. Shamsolmoali, “Rethinking fast adversarial train- ing: A splitting technique to overcome catastrophic overfitting,” in ECCV. Springer, 2024, pp. 34–51

2024
[59]

Preventing catastrophic over- fitting in fast adversarial training: A bi-level optimization perspective,

Z. Wang, H. Wang, C. Tian, and Y . Jin, “Preventing catastrophic over- fitting in fast adversarial training: A bi-level optimization perspective,” inECCV. Springer, 2024, pp. 144–160

2024
[60]

Revealing the pseudo-robust shortcut dependency,

L. Runqi, Y . Chaojian, H. Bo, S. Hang, and L. Tongliang, “Revealing the pseudo-robust shortcut dependency,” inICML. PMLR, 2024, pp. 2663–2672

2024
[61]

Badnets: Evaluating backdooring attacks on deep neural networks,

T. Gu, K. Liu, B. Dolan-Gavitt, and S. Garg, “Badnets: Evaluating backdooring attacks on deep neural networks,”IEEE Access, vol. 7, pp. 47 230–47 244, 2019

2019
[62]

Trojaning attack on neural networks,

Y . Liu, S. Ma, Y . Aafer, W.-C. Lee, J. Zhai, W. Wang, and X. Zhang, “Trojaning attack on neural networks,” inNDSS. Internet Soc, 2018

2018
[63]

Invisible backdoor attack with sample-specific triggers,

Y . Li, Y . Li, B. Wu, L. Li, R. He, and S. Lyu, “Invisible backdoor attack with sample-specific triggers,” inICCV, 2021, pp. 16 463–16 472

2021
[64]

Rethinking the backdoor attacks’ triggers: A frequency perspective,

Y . Zeng, W. Park, Z. M. Mao, and R. Jia, “Rethinking the backdoor attacks’ triggers: A frequency perspective,” inICCV, 2021, pp. 16 473– 16 481

2021
[65]

Fsba: Invisible back- door attacks via frequency domain and singular value decomposition,

W. Chen, X. Xu, X. Wang, Z. Li, and Y . Chen, “Fsba: Invisible back- door attacks via frequency domain and singular value decomposition,” Expert Systems with Applications, vol. 288, p. 127830, 2025

2025
[66]

Wanet–imperceptible warping-based back- door attack,

A. Nguyen and A. Tran, “Wanet–imperceptible warping-based back- door attack,”arXiv preprint arXiv:2102.10369, 2021

work page arXiv 2021
[67]

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted backdoor attacks on deep learning systems using data poisoning,”arXiv preprint arXiv:1712.05526, 2017

work page internal anchor Pith review arXiv 2017
[68]

An invisible black-box backdoor attack through frequency domain,

T. Wang, Y . Yao, F. Xu, S. An, H. Tong, and T. Wang, “An invisible black-box backdoor attack through frequency domain,” inECCV. Springer, 2022, pp. 396–413

2022
[69]

Lira: Learnable, imperceptible and robust backdoor attacks,

K. Doan, Y . Lao, W. Zhao, and P. Li, “Lira: Learnable, imperceptible and robust backdoor attacks,” inICCV, 2021, pp. 11 966–11 976

2021
[70]

An embarrassingly simple backdoor attack on self-supervised learning,

C. Li, R. Pang, Z. Xi, T. Du, S. Ji, Y . Yao, and T. Wang, “An embarrassingly simple backdoor attack on self-supervised learning,” inICCV, 2023, pp. 4367–4378

2023
[71]

Unlearnable examples: Making personal data unexploitable,

H. Huang, X. Ma, S. M. Erfani, J. Bailey, and Y . Wang, “Unlearnable examples: Making personal data unexploitable,” inICLR, 2021

2021
[72]

Transferable unlearnable examples,

J. Ren, H. Xu, Y . Wan, X. Ma, L. Sun, and J. Tang, “Transferable unlearnable examples,”arXiv preprint arXiv:2210.10114, 2022

work page arXiv 2022
[73]

Unlearnable clusters: Towards label-agnostic unlearnable examples,

J. Zhang, X. Ma, Q. Yi, J. Sang, Y .-G. Jiang, Y . Wang, and C. Xu, “Unlearnable clusters: Towards label-agnostic unlearnable examples,” inCVPR, 2023, pp. 3984–3993. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY , VOL. XX, NO. X, MONTH 20XX 12

2023
[74]

Multimodal unlearnable examples: Protecting data against multimodal contrastive learning,

X. Liu, X. Jia, Y . Xun, S. Liang, and X. Cao, “Multimodal unlearnable examples: Protecting data against multimodal contrastive learning,” in ACMM, 2024, pp. 8024–8033

2024
[75]

Learning the unlearn- able: Adversarial augmentations suppress unlearnable example attacks,

T. Qin, X. Gao, J. Zhao, K. Ye, and C.-Z. Xu, “Learning the unlearn- able: Adversarial augmentations suppress unlearnable example attacks,” arXiv preprint arXiv:2303.15127, 2023

work page arXiv 2023
[76]

Robust unlearnable examples: Protecting data against adversarial learning,

S. Fu, F. He, Y . Liu, L. Shen, and D. Tao, “Robust unlearnable examples: Protecting data against adversarial learning,”arXiv preprint arXiv:2203.14533, 2022

work page arXiv 2022
[77]

Ungeneralizable examples,

J. Ye and X. Wang, “Ungeneralizable examples,” inCVPR, 2024, pp. 11 944–11 953

2024
[78]

Detecting and corrupting convolution-based unlearnable examples,

M. Li, X. Wang, Z. Yu, S. Hu, Z. Zhou, L. Zhang, and L. Y . Zhang, “Detecting and corrupting convolution-based unlearnable examples,” in AAAI, vol. 39, no. 17, 2025, pp. 18 403–18 411

2025
[79]

Learning multiple layers of features from tiny images,

A. Krizhevsky, G. Hintonet al., “Learning multiple layers of features from tiny images,” 2009

2009
[80]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inCVPR, 2016, pp. 770–778

2016

Showing first 80 references.