pith. machine review for the scientific record. sign in

arxiv: 2604.24350 · v1 · submitted 2026-04-27 · 💻 cs.LG · cs.AI· cs.CR

Recognition: unknown

Unveiling the Backdoor Mechanism Hidden Behind Catastrophic Overfitting in Fast Adversarial Training

Authors on Pith no claims yet

Pith reviewed 2026-05-08 04:25 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CR
keywords catastrophic overfittingfast adversarial trainingbackdoor attacksunlearnable tasksadversarial robustnessneural network trainingmodel interpretation
0
0 comments X

The pith

Catastrophic overfitting in fast adversarial training functions as a weak backdoor trigger variant of unlearnable tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks a systematic explanation for why fast adversarial training produces models that overfit catastrophically to one attack and lose robustness to others. It does so by examining training dynamics and finding signatures that match backdoor behavior: divided pathways in the network, inconsistent feature predictions across classes, and triggers that distinguish classes universally. A reader would care because this framing turns an opaque failure mode into a recognizable mechanism, opening the door to proven countermeasures from backdoor research instead of ad-hoc fixes.

Core claim

We interpret catastrophic overfitting in fast adversarial training as a backdoor phenomenon, supported by evidence of pathway division, diverse feature predictions, and universal class-distinguishable triggers. This leads us to conceptualize CO as a weak trigger variant of unlearnable tasks, placing CO, backdoor attacks, and unlearnable tasks inside one theoretical framework. The same view directly motivates backdoor-style interventions: recalibrating parameters through fine-tuning, linear probing, or reinitialization, plus a weight-outlier suppression term to curb abnormal weight growth.

What carries the argument

The backdoor lens on CO, which treats the overfitting as a weak, class-distinguishable trigger that unifies it with unlearnable tasks and enables direct transfer of mitigation tactics.

If this is right

  • Backdoor-inspired recalibration of parameters restores generalization to unseen attacks.
  • A weight-outlier suppression constraint limits the abnormal weight deviations that accompany CO.
  • The shared framework predicts that techniques successful against backdoors will also reduce CO.
  • Unlearnable-task methods become applicable to diagnosing and preventing catastrophic overfitting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Standard backdoor detection tools could be repurposed to flag CO during training before it becomes catastrophic.
  • Joint study of CO and unlearnable examples may reveal shared dynamics that govern when data becomes unusable for robust learning.
  • The unification suggests that robustness benchmarks should test models against both adversarial and backdoor-style triggers.

Load-bearing premise

The listed phenomena (pathway division, diverse feature predictions, and universal class-distinguishable triggers) demonstrate a backdoor mechanism rather than ordinary overfitting or memorization.

What would settle it

A controlled run in which a model exhibits clear catastrophic overfitting yet shows none of the three backdoor indicators (pathway division, diverse predictions, or universal triggers) would refute the proposed mechanism.

Figures

Figures reproduced from arXiv: 2604.24350 by Baocai Yin, Bo Wang, Lihe Zhang, Mengnan Zhao, Tianhang Zheng.

Figure 1
Figure 1. Figure 1: Distance matrix and UMAP visualizations under view at source ↗
Figure 2
Figure 2. Figure 2: Distance matrix and UMAP visualizations under view at source ↗
Figure 3
Figure 3. Figure 3: Training dynamics of existing FAT methods. view at source ↗
Figure 4
Figure 4. Figure 4: Backdoor fine-tuning techniques. FT: finetuning. view at source ↗
Figure 5
Figure 5. Figure 5: Weight distribution relative to the mean weight. ‘Count’ means the distribution percentage. view at source ↗
Figure 6
Figure 6. Figure 6: Ablation studies comparing our Lreg against simpler alternatives view at source ↗
Figure 7
Figure 7. Figure 7: Comparison between FGSM-based AT and PGD-based AT. whereas adversarial training improves accuracy to 86.89%. We investigate the transferability of the CO mitigation strategy to unlearnable tasks, with results summarized in Table VII. When the poisoning budget in unlearnable tasks is small, the Lreg for mitigating CO can also resolve the class-wise unlearnable attack, achieving performance comparable to the… view at source ↗
Figure 8
Figure 8. Figure 8: Ablation studies on the hyperparameter β. TABLE VII: Transferability of the CO mitigation strategy (Lreg in Eq. (12) ) to unlearnable tasks [71]. ‘Poisoned’, ‘AT’, and ‘Lreg’ correspond to models trained on the CIFAR￾10 poisoned dataset using ResNet-18 with standard training, adversarial training, and standard training augmented with Lreg, respectively. Each model is trained for 60 epochs using a cyclic le… view at source ↗
read the original abstract

Fast Adversarial Training (FAT) has attracted significant attention due to its efficiency in enhancing neural network robustness against adversarial attacks. However, FAT is prone to catastrophic overfitting (CO), wherein models overfit to the specific attack used during training and fail to generalize to others. While existing methods introduce diverse hypotheses and propose various strategies to mitigate CO, a systematic and intuitive explanation of CO remains absent. In this work, we innovatively interpret CO through the lens of backdoor. Through validations on pathway division, diverse feature predictions, and universal class distinguishable triggers in CO, we conceptualize CO as a weak trigger variant of unlearnable tasks, unifying CO, backdoor attacks, and unlearnable tasks under a common theoretical framework. Guided by this, we leverage several backdoor inspired strategies to mitigate CO: (i) Recalibrate CO affected model parameters using vanilla fine tuning, linear probing, or reinitialization-based techniques; (ii) Introduce a weight outlier suppression constraint to regulate abnormal deviations in model weights. Extensive experiments support our interpretation of CO and show the efficacy of the proposed mitigation strategies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that catastrophic overfitting (CO) in fast adversarial training (FAT) is a weak-trigger variant of unlearnable tasks, thereby unifying CO, backdoor attacks, and unlearnable tasks under a single theoretical framework. It validates this view via observations of pathway division, diverse feature predictions, and universal class-distinguishable triggers, then proposes backdoor-inspired mitigations consisting of parameter recalibration (vanilla fine-tuning, linear probing, or reinitialization) and a weight-outlier suppression constraint. The abstract states that experiments support both the interpretation and the efficacy of the fixes.

Significance. If the backdoor interpretation holds and the mitigations prove robust, the work would supply a novel unifying lens on CO that could import techniques from the backdoor literature into adversarial training, potentially yielding more reliable and efficient robustness methods. The practical recalibration and regularization strategies are directly usable and could improve FAT in settings where speed is critical.

major comments (3)
  1. [validation experiments] The validations on pathway division, diverse feature predictions, and universal class-distinguishable triggers (described in the abstract and the validation section) do not include control experiments that would distinguish a backdoor (weak-trigger unlearnable-task) mechanism from standard explanations of CO such as memorization of attack-specific perturbation directions. Without such falsifying tests, the central interpretive claim remains compatible with non-backdoor accounts of overfitting.
  2. [theoretical framework / conceptualization] No formal definition, mathematical axioms, or precise characterization of the claimed 'common theoretical framework' is supplied. The conceptualization of CO as a 'weak trigger variant of unlearnable tasks' is introduced informally, which prevents rigorous verification of the unification and makes the framework non-load-bearing for the paper's conclusions.
  3. [experiments / abstract] The abstract asserts that 'extensive experiments support our interpretation' yet reports no quantitative metrics, error bars, ablation details, or comparisons against existing CO mitigation baselines. This absence undermines the ability to assess whether the proposed mitigations outperform prior methods or merely reproduce known regularization effects.
minor comments (2)
  1. [mitigation strategies] Notation for the weight-outlier suppression constraint should be defined explicitly (e.g., the precise form of the regularizer and its hyper-parameters) to allow reproduction.
  2. [conclusion] The manuscript would benefit from a dedicated limitations paragraph discussing the scope of the backdoor analogy (e.g., whether it applies only to specific attack norms or architectures).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and insightful comments. We address each major point below, providing our responses and indicating revisions to the manuscript.

read point-by-point responses
  1. Referee: [validation experiments] The validations on pathway division, diverse feature predictions, and universal class-distinguishable triggers (described in the abstract and the validation section) do not include control experiments that would distinguish a backdoor (weak-trigger unlearnable-task) mechanism from standard explanations of CO such as memorization of attack-specific perturbation directions. Without such falsifying tests, the central interpretive claim remains compatible with non-backdoor accounts of overfitting.

    Authors: We appreciate this suggestion for strengthening the interpretive claim. Our existing validations demonstrate phenomena such as pathway division and universal class-distinguishable triggers that are consistent with a weak-trigger backdoor mechanism. We agree that dedicated control experiments are needed to differentiate from alternatives like memorization of perturbation directions. In the revision, we will add control studies, including training with non-adversarial or random perturbations and testing for trigger universality in non-CO settings, to provide falsifying evidence. revision: yes

  2. Referee: [theoretical framework / conceptualization] No formal definition, mathematical axioms, or precise characterization of the claimed 'common theoretical framework' is supplied. The conceptualization of CO as a 'weak trigger variant of unlearnable tasks' is introduced informally, which prevents rigorous verification of the unification and makes the framework non-load-bearing for the paper's conclusions.

    Authors: The unification is presented as a conceptual framework highlighting mechanistic parallels, such as weak triggers inducing overfitting to specific patterns across CO, backdoors, and unlearnable tasks. We acknowledge the benefit of greater precision. We will add a subsection with a more formal characterization, defining shared properties (e.g., trigger weakness leading to class-specific overfitting) to support verification while retaining the intuitive unification. revision: yes

  3. Referee: [experiments / abstract] The abstract asserts that 'extensive experiments support our interpretation' yet reports no quantitative metrics, error bars, ablation details, or comparisons against existing CO mitigation baselines. This absence undermines the ability to assess whether the proposed mitigations outperform prior methods or merely reproduce known regularization effects.

    Authors: The abstract is intentionally concise, while the body of the manuscript (experimental sections) reports quantitative metrics, error bars from multiple runs, ablation studies on recalibration techniques and outlier suppression, and direct comparisons to prior CO mitigation methods. To improve transparency, we will revise the abstract to briefly reference these key results and performance advantages. revision: yes

Circularity Check

0 steps flagged

No significant circularity in interpretive unification of CO with backdoor/unlearnable tasks

full rationale

The paper's core move is an empirical interpretation: it reports observations of pathway division, diverse feature predictions, and class-distinguishable triggers within CO, then proposes to view CO as a weak-trigger variant of unlearnable tasks. This is presented as a unifying lens rather than a mathematical derivation, first-principles proof, or fitted model whose outputs are renamed as predictions. No equations appear that reduce to their own inputs by construction, no parameters are fitted on a subset and then called predictions on related quantities, and no load-bearing self-citations or uniqueness theorems are invoked. The mitigation strategies (fine-tuning, weight suppression) are motivated by the interpretation but remain independent empirical tests. The claimed common theoretical framework is therefore self-contained as an organizing perspective on existing phenomena, not a closed loop that forces the result from the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Only the abstract is available, so the ledger is necessarily incomplete. The central claim introduces one new conceptual entity (the weak-trigger variant) and relies on standard neural-network training assumptions plus the unstated premise that the listed validation phenomena are diagnostic of backdoors.

axioms (1)
  • domain assumption Neural networks trained with fast adversarial training can develop class-specific feature pathways that are separable from normal decision boundaries.
    Invoked when the authors validate via pathway division.
invented entities (1)
  • weak trigger variant of unlearnable tasks no independent evidence
    purpose: To reinterpret catastrophic overfitting as a backdoor-like phenomenon
    The entity is postulated to unify three previously separate concepts; no independent falsifiable prediction (e.g., a specific trigger strength threshold) is stated in the abstract.

pith-pipeline@v0.9.0 · 5507 in / 1419 out tokens · 43171 ms · 2026-05-08T04:25:44.495095+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

106 extracted references · 17 canonical work pages · 2 internal anchors

  1. [1]

    Deep learning in optical metrology: a review,

    C. Zuo, J. Qian, S. Feng, W. Yin, Y . Li, P. Fan, J. Han, K. Qian, and Q. Chen, “Deep learning in optical metrology: a review,”Light: Science & Applications, vol. 11, no. 1, p. 39, 2022

  2. [2]

    Deep-learning seismology,

    S. M. Mousavi and G. C. Beroza, “Deep-learning seismology,”Science, vol. 377, no. 6607, p. eabm4470, 2022

  3. [3]

    Sleap: A deep learning system for multi-animal pose tracking,

    T. D. Pereira, N. Tabris, A. Matsliah, D. M. Turner, J. Li, S. Ravin- dranath, E. S. Papadoyannis, E. Normand, D. S. Deutsch, Z. Y . Wang et al., “Sleap: A deep learning system for multi-animal pose tracking,” Nature methods, vol. 19, no. 4, pp. 486–495, 2022

  4. [4]

    Deep learning and protein structure modeling,

    M. Baek and D. Baker, “Deep learning and protein structure modeling,” Nature methods, vol. 19, no. 1, pp. 13–14, 2022

  5. [5]

    Current progress and open challenges for applying deep learning across the biosciences,

    N. Sapoval, A. Aghazadeh, M. G. Nute, D. A. Antunes, A. Balaji, R. Baraniuk, C. Barberan, R. Dannenfelser, C. Dun, M. Edrisiet al., “Current progress and open challenges for applying deep learning across the biosciences,”Nature Communications, vol. 13, no. 1, p. 1728, 2022

  6. [6]

    How to backdoor diffusion models?

    S.-Y . Chou, P.-Y . Chen, and T.-Y . Ho, “How to backdoor diffusion models?” inCVPR, 2023, pp. 4015–4024

  7. [7]

    Noise-suppressing neural dynamics for time-dependent constrained nonlinear optimization with applications,

    L. Wei, L. Jin, and X. Luo, “Noise-suppressing neural dynamics for time-dependent constrained nonlinear optimization with applications,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 10, pp. 6139–6150, 2022

  8. [8]

    Effective backdoor defense by exploiting sensitivity of poisoned samples,

    W. Chen, B. Wu, and H. Wang, “Effective backdoor defense by exploiting sensitivity of poisoned samples,” vol. 35, 2022, pp. 9727– 9737

  9. [9]

    Advdo: Re- alistic adversarial attacks for trajectory prediction,

    Y . Cao, C. Xiao, A. Anandkumar, D. Xu, and M. Pavone, “Advdo: Re- alistic adversarial attacks for trajectory prediction,” inECCV. Springer, 2022, pp. 36–52

  10. [10]

    Segpgd: An effective and efficient adversarial attack for evaluating and boosting segmentation robustness,

    J. Gu, H. Zhao, V . Tresp, and P. H. Torr, “Segpgd: An effective and efficient adversarial attack for evaluating and boosting segmentation robustness,” inECCV. Springer, 2022, pp. 308–325

  11. [11]

    Shadows can be dangerous: Stealthy and effective physical-world adversarial attack by natural phenomenon,

    Y . Zhong, X. Liu, D. Zhai, J. Jiang, and X. Ji, “Shadows can be dangerous: Stealthy and effective physical-world adversarial attack by natural phenomenon,” inCVPR, 2022, pp. 15 345–15 354

  12. [12]

    Structure-guided adversarial training of diffusion models,

    L. Yang, H. Qian, Z. Zhang, J. Liu, and B. Cui, “Structure-guided adversarial training of diffusion models,” inCVPR, 2024, pp. 7256– 7266

  13. [13]

    Defending

    S. Casper, L. Schulze, O. Patel, and D. Hadfield-Menell, “Defending against unforeseen failure modes with latent adversarial training,”arXiv preprint arXiv:2403.05030, 2024

  14. [14]

    Defensive unlearning with adversarial training for robust concept erasure in diffusion models,

    Y . Zhang, X. Chen, J. Jia, Y . Zhang, C. Fan, J. Liu, M. Hong, K. Ding, and S. Liu, “Defensive unlearning with adversarial training for robust concept erasure in diffusion models,” vol. 37, 2024, pp. 36 748–36 776

  15. [15]

    Revisiting adversarial training under long-tailed distributions,

    X. Yue, N. Mou, Q. Wang, and L. Zhao, “Revisiting adversarial training under long-tailed distributions,” inCVPR, 2024, pp. 24 492–24 501

  16. [16]

    Enhancing noise robustness of retrieval-augmented language models with adaptive adver- sarial training.arXiv preprint arXiv:2405.20978, 2024

    F. Fang, Y . Bai, S. Ni, M. Yang, X. Chen, and R. Xu, “Enhancing noise robustness of retrieval-augmented language models with adaptive adversarial training,”arXiv preprint arXiv:2405.20978, 2024

  17. [17]

    Effec- tive single-step adversarial training with energy-based models,

    K. Tang, T. Lou, W. Peng, N. Chen, Y . Shi, and W. Wang, “Effec- tive single-step adversarial training with energy-based models,”IEEE Transactions on Emerging Topics in Computational Intelligence, 2024. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY , VOL. XX, NO. X, MONTH 20XX 11

  18. [18]

    When adversarial train- ing meets vision transformers: Recipes from training to architecture,

    Y . Mo, D. Wu, Y . Wang, Y . Guo, and Y . Wang, “When adversarial train- ing meets vision transformers: Recipes from training to architecture,” vol. 35, 2022, pp. 18 599–18 611

  19. [19]

    Las-at: adversarial training with learnable attack strategy,

    X. Jia, Y . Zhang, B. Wu, K. Ma, J. Wang, and X. Cao, “Las-at: adversarial training with learnable attack strategy,” inCVPR, 2022, pp. 13 398–13 408

  20. [20]

    Towards efficient adversarial training on vision transformers,

    B. Wu, J. Gu, Z. Li, D. Cai, X. He, and W. Liu, “Towards efficient adversarial training on vision transformers,” inECCV. Springer, 2022, pp. 307–325

  21. [21]

    Stability analysis and generalization bounds of adversarial training,

    J. Xiao, Y . Fan, R. Sun, J. Wang, and Z.-Q. Luo, “Stability analysis and generalization bounds of adversarial training,” vol. 35, 2022, pp. 15 446–15 459

  22. [22]

    Enhancing adversarial training with second-order statistics of weights,

    G. Jin, X. Yi, W. Huang, S. Schewe, and X. Huang, “Enhancing adversarial training with second-order statistics of weights,” inCVPR, 2022, pp. 15 273–15 283

  23. [23]

    Cross-lingual event detection via optimized adversarial training,

    L. Guzman-Nateras, M. Van Nguyen, and T. Nguyen, “Cross-lingual event detection via optimized adversarial training,” inConference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022, pp. 5588–5599

  24. [24]

    Towards deep learning models resistant to adversarial attacks,

    A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” inICLR, 2018

  25. [25]

    Adversarial initialization with universal adversarial perturbation: A new approach to fast adversarial training,

    C. Pan, Q. Li, and X. Yao, “Adversarial initialization with universal adversarial perturbation: A new approach to fast adversarial training,” inAAAI, vol. 38, no. 19, 2024, pp. 21 501–21 509

  26. [26]

    Understanding catastrophic overfitting in single-step adversarial training,

    H. Kim, W. Lee, and J. Lee, “Understanding catastrophic overfitting in single-step adversarial training,” inAAAI, vol. 35, no. 9, 2021, pp. 8119–8127

  27. [27]

    Fast is better than free: Revisiting adversarial training,

    K. J. Z. Wong E, Rice L, “Fast is better than free: Revisiting adversarial training,” inICLR, 2020

  28. [28]

    Subspace adversarial training,

    T. Li, Y . Wu, S. Chen, K. Fang, and X. Huang, “Subspace adversarial training,” inCVPR, 2022, pp. 13 409–13 418

  29. [29]

    Fast adversarial training with adaptive step size,

    Z. Huang, Y . Fan, C. Liu, W. Zhang, Y . Zhang, M. Salzmann, S. S ¨usstrunk, and J. Wang, “Fast adversarial training with adaptive step size,”IEEE TIP, vol. 32, pp. 6102–6114, 2023

  30. [30]

    Reliably fast adversarial training via latent adversarial perturbation,

    G. Y . Park and S. W. Lee, “Reliably fast adversarial training via latent adversarial perturbation,” inICCV, 2021, pp. 7758–7767

  31. [31]

    Fast adversarial training against textual adversarial attacks.arXiv preprint arXiv:2401.12461, 2024

    Y . Yang, X. Liu, and K. He, “Fast adversarial training against textual adversarial attacks,”arXiv preprint arXiv:2401.12461, 2024

  32. [32]

    Fast adversarial training with adaptive step size,

    Z. Huang, Y . Fan, C. Liu, W. Zhang, Y . Zhang, M. Salzmann, S. S ¨usstrunk, and J. Wang, “Fast adversarial training with adaptive step size,”arXiv preprint arXiv:2206.02417, 2022

  33. [33]

    Fast adversarial training with smooth convergence,

    M. Zhao, L. Zhang, Y . Kong, and B. Yin, “Fast adversarial training with smooth convergence,” inICCV, 2023, pp. 4720–4729

  34. [34]

    Zerograd: Costless conscious remedies for catastrophic overfitting in the fgsm adversarial training,

    Z. Golgooni, M. Saberi, M. Eskandar, and M. H. Rohban, “Zerograd: Costless conscious remedies for catastrophic overfitting in the fgsm adversarial training,”Intelligent Systems with Applications, vol. 19, p. 200258, 2023

  35. [35]

    Understanding and improving fast adver- sarial training,

    F. N. Andriushchenko M, “Understanding and improving fast adver- sarial training,” 2020, pp. 16 048–16 059

  36. [36]

    Prior-guided adversarial initialization for fast adversarial training,

    J. Xiaojun, Z. Yong, W. Xingxing, W. Baoyuan, M. Ke, W. Jue, and C. Xiaochun, “Prior-guided adversarial initialization for fast adversarial training,” inECCV, 2022

  37. [37]

    Investigating catastrophic overfitting in fast adversarial training: A self-fitting perspective,

    Z. He, T. Li, S. Chen, and X. Huang, “Investigating catastrophic overfitting in fast adversarial training: A self-fitting perspective,” in CVPR, 2023, pp. 2313–2320

  38. [38]

    Understanding and improv- ing fast adversarial training,

    M. Andriushchenko and N. Flammarion, “Understanding and improv- ing fast adversarial training,” vol. 33, 2020, pp. 16 048–16 059

  39. [39]

    Catastrophic overfitting: A potential blessing in disguise,

    M. Zhao, L. Zhang, Y . Kong, and B. Yin, “Catastrophic overfitting: A potential blessing in disguise,” inECCV. Springer, 2024, pp. 293–310

  40. [40]

    Backdoor attacks and defenses targeting multi-domain ai models: A comprehensive review,

    S. Zhang, Y . Pan, Q. Liu, Z. Yan, K.-K. R. Choo, and G. Wang, “Backdoor attacks and defenses targeting multi-domain ai models: A comprehensive review,”ACM Computing Surveys, vol. 57, no. 4, pp. 1–35, 2024

  41. [41]

    Watch out for your agents! investigating backdoor threats to llm-based agents,

    W. Yang, X. Bi, Y . Lin, S. Chen, J. Zhou, and X. Sun, “Watch out for your agents! investigating backdoor threats to llm-based agents,” vol. 37, 2024, pp. 100 938–100 964

  42. [42]

    Badclip: Dual-embedding guided backdoor attack on multimodal contrastive learning,

    S. Liang, M. Zhu, A. Liu, B. Wu, X. Cao, and E.-C. Chang, “Badclip: Dual-embedding guided backdoor attack on multimodal contrastive learning,” inCVPR, 2024, pp. 24 645–24 654

  43. [43]

    Physical backdoor: Towards temperature-based backdoor attacks in the physical world,

    W. Yin, J. Lou, P. Zhou, Y . Xie, D. Feng, Y . Sun, T. Zhang, and L. Sun, “Physical backdoor: Towards temperature-based backdoor attacks in the physical world,” inCVPR, 2024, pp. 12 733–12 743

  44. [44]

    Adversarial machine learning at scale,

    A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial machine learning at scale,” inICLR, 2017

  45. [45]

    Boosting adversarial attacks with momentum,

    Y . Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li, “Boosting adversarial attacks with momentum,” inCVPR, 2018, pp. 9185–9193

  46. [46]

    Defense against adversarial attacks using topology aligning adversarial training,

    H. Kuang, H. Liu, X. Lin, and R. Ji, “Defense against adversarial attacks using topology aligning adversarial training,”IEEE TIFS, vol. 19, pp. 3659–3673, 2024

  47. [47]

    Revisiting adversarial training at scale,

    Z. Wang, X. Li, H. Zhu, and C. Xie, “Revisiting adversarial training at scale,” inCVPR, 2024, pp. 24 675–24 685

  48. [48]

    Explaining and harnessing adversarial examples,

    I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” inICLR, 2015

  49. [49]

    Revisiting and advancing fast adversarial training through the lens of bi-level optimization,

    Y . Zhang, G. Zhang, P. Khanduri, M. Hong, S. Chang, and S. Liu, “Revisiting and advancing fast adversarial training through the lens of bi-level optimization,” inICML. PMLR, 2022, pp. 26 693–26 712

  50. [50]

    Boosting fast adversarial training with learnable adversarial initialization,

    X. Jia, Y . Zhang, B. Wu, J. Wang, and X. Cao, “Boosting fast adversarial training with learnable adversarial initialization,”IEEE TIP, vol. 31, pp. 4417–4430, 2022

  51. [51]

    Prior- guided adversarial initialization for fast adversarial training,

    X. Jia, Y . Zhang, X. Wei, B. Wu, K. Ma, J. Wang, and X. Cao, “Prior- guided adversarial initialization for fast adversarial training,” inECCV. Springer, 2022, pp. 567–584

  52. [52]

    Taxonomy driven fast adversarial training,

    K. Tong, C. Jiang, J. Gui, and Y . Cao, “Taxonomy driven fast adversarial training,” inAAAI, vol. 38, no. 6, 2024, pp. 5233–5242

  53. [53]

    Revisiting and exploring efficient fast adversarial training via law: Lipschitz regularization and auto weight averaging,

    X. Jia, Y . Chen, X. Mao, R. Duan, J. Gu, R. Zhang, H. Xue, Y . Liu, and X. Cao, “Revisiting and exploring efficient fast adversarial training via law: Lipschitz regularization and auto weight averaging,”IEEE TIFS, 2024

  54. [54]

    Fast propagation is better: Ac- celerating single-step adversarial training via sampling subnetworks,

    X. Jia, J. Li, J. Gu, Y . Bai, and X. Cao, “Fast propagation is better: Ac- celerating single-step adversarial training via sampling subnetworks,” IEEE TIFS, 2024

  55. [55]

    Make some noise: Reliable and efficient single-step adversarial training,

    P. de Jorge Aranda, A. Bibi, R. V olpi, A. Sanyal, P. Torr, G. Rogez, and P. Dokania, “Make some noise: Reliable and efficient single-step adversarial training,” vol. 35, 2022, pp. 12 881–12 893

  56. [56]

    Overfitting in adversarially robust deep learning,

    L. Rice, E. Wong, and Z. Kolter, “Overfitting in adversarially robust deep learning,” inICML. PMLR, 2020, pp. 8093–8104

  57. [57]

    Improving fast adversarial training with prior-guided knowledge,

    X. Jia, Y . Zhang, X. Wei, B. Wu, K. Ma, J. Wang, and X. Cao, “Improving fast adversarial training with prior-guided knowledge,” TPAMI, vol. 46, no. 9, pp. 6367–6383, 2024

  58. [58]

    Rethinking fast adversarial train- ing: A splitting technique to overcome catastrophic overfitting,

    M. Zareapoor and P. Shamsolmoali, “Rethinking fast adversarial train- ing: A splitting technique to overcome catastrophic overfitting,” in ECCV. Springer, 2024, pp. 34–51

  59. [59]

    Preventing catastrophic over- fitting in fast adversarial training: A bi-level optimization perspective,

    Z. Wang, H. Wang, C. Tian, and Y . Jin, “Preventing catastrophic over- fitting in fast adversarial training: A bi-level optimization perspective,” inECCV. Springer, 2024, pp. 144–160

  60. [60]

    Revealing the pseudo-robust shortcut dependency,

    L. Runqi, Y . Chaojian, H. Bo, S. Hang, and L. Tongliang, “Revealing the pseudo-robust shortcut dependency,” inICML. PMLR, 2024, pp. 2663–2672

  61. [61]

    Badnets: Evaluating backdooring attacks on deep neural networks,

    T. Gu, K. Liu, B. Dolan-Gavitt, and S. Garg, “Badnets: Evaluating backdooring attacks on deep neural networks,”IEEE Access, vol. 7, pp. 47 230–47 244, 2019

  62. [62]

    Trojaning attack on neural networks,

    Y . Liu, S. Ma, Y . Aafer, W.-C. Lee, J. Zhai, W. Wang, and X. Zhang, “Trojaning attack on neural networks,” inNDSS. Internet Soc, 2018

  63. [63]

    Invisible backdoor attack with sample-specific triggers,

    Y . Li, Y . Li, B. Wu, L. Li, R. He, and S. Lyu, “Invisible backdoor attack with sample-specific triggers,” inICCV, 2021, pp. 16 463–16 472

  64. [64]

    Rethinking the backdoor attacks’ triggers: A frequency perspective,

    Y . Zeng, W. Park, Z. M. Mao, and R. Jia, “Rethinking the backdoor attacks’ triggers: A frequency perspective,” inICCV, 2021, pp. 16 473– 16 481

  65. [65]

    Fsba: Invisible back- door attacks via frequency domain and singular value decomposition,

    W. Chen, X. Xu, X. Wang, Z. Li, and Y . Chen, “Fsba: Invisible back- door attacks via frequency domain and singular value decomposition,” Expert Systems with Applications, vol. 288, p. 127830, 2025

  66. [66]

    Wanet–imperceptible warping-based back- door attack,

    A. Nguyen and A. Tran, “Wanet–imperceptible warping-based back- door attack,”arXiv preprint arXiv:2102.10369, 2021

  67. [67]

    Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

    X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted backdoor attacks on deep learning systems using data poisoning,”arXiv preprint arXiv:1712.05526, 2017

  68. [68]

    An invisible black-box backdoor attack through frequency domain,

    T. Wang, Y . Yao, F. Xu, S. An, H. Tong, and T. Wang, “An invisible black-box backdoor attack through frequency domain,” inECCV. Springer, 2022, pp. 396–413

  69. [69]

    Lira: Learnable, imperceptible and robust backdoor attacks,

    K. Doan, Y . Lao, W. Zhao, and P. Li, “Lira: Learnable, imperceptible and robust backdoor attacks,” inICCV, 2021, pp. 11 966–11 976

  70. [70]

    An embarrassingly simple backdoor attack on self-supervised learning,

    C. Li, R. Pang, Z. Xi, T. Du, S. Ji, Y . Yao, and T. Wang, “An embarrassingly simple backdoor attack on self-supervised learning,” inICCV, 2023, pp. 4367–4378

  71. [71]

    Unlearnable examples: Making personal data unexploitable,

    H. Huang, X. Ma, S. M. Erfani, J. Bailey, and Y . Wang, “Unlearnable examples: Making personal data unexploitable,” inICLR, 2021

  72. [72]

    Transferable unlearnable examples,

    J. Ren, H. Xu, Y . Wan, X. Ma, L. Sun, and J. Tang, “Transferable unlearnable examples,”arXiv preprint arXiv:2210.10114, 2022

  73. [73]

    Unlearnable clusters: Towards label-agnostic unlearnable examples,

    J. Zhang, X. Ma, Q. Yi, J. Sang, Y .-G. Jiang, Y . Wang, and C. Xu, “Unlearnable clusters: Towards label-agnostic unlearnable examples,” inCVPR, 2023, pp. 3984–3993. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY , VOL. XX, NO. X, MONTH 20XX 12

  74. [74]

    Multimodal unlearnable examples: Protecting data against multimodal contrastive learning,

    X. Liu, X. Jia, Y . Xun, S. Liang, and X. Cao, “Multimodal unlearnable examples: Protecting data against multimodal contrastive learning,” in ACMM, 2024, pp. 8024–8033

  75. [75]

    Learning the unlearn- able: Adversarial augmentations suppress unlearnable example attacks,

    T. Qin, X. Gao, J. Zhao, K. Ye, and C.-Z. Xu, “Learning the unlearn- able: Adversarial augmentations suppress unlearnable example attacks,” arXiv preprint arXiv:2303.15127, 2023

  76. [76]

    Robust unlearnable examples: Protecting data against adversarial learning,

    S. Fu, F. He, Y . Liu, L. Shen, and D. Tao, “Robust unlearnable examples: Protecting data against adversarial learning,”arXiv preprint arXiv:2203.14533, 2022

  77. [77]

    Ungeneralizable examples,

    J. Ye and X. Wang, “Ungeneralizable examples,” inCVPR, 2024, pp. 11 944–11 953

  78. [78]

    Detecting and corrupting convolution-based unlearnable examples,

    M. Li, X. Wang, Z. Yu, S. Hu, Z. Zhou, L. Zhang, and L. Y . Zhang, “Detecting and corrupting convolution-based unlearnable examples,” in AAAI, vol. 39, no. 17, 2025, pp. 18 403–18 411

  79. [79]

    Learning multiple layers of features from tiny images,

    A. Krizhevsky, G. Hintonet al., “Learning multiple layers of features from tiny images,” 2009

  80. [80]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inCVPR, 2016, pp. 770–778

Showing first 80 references.