Catastrophic Overfitting, Entropy Gap and Participation Ratio: A Noiseless $l^p$ Norm Solution for Fast Adversarial Training

Fares B. Mehouachi; Saif Eddin Jabari

arxiv: 2505.02360 · v2 · pith:IL5F52LVnew · submitted 2025-05-05 · 💻 cs.LG · cs.AI

Catastrophic Overfitting, Entropy Gap and Participation Ratio: A Noiseless l^p Norm Solution for Fast Adversarial Training

Fares B. Mehouachi , Saif Eddin Jabari This is my paper

Pith reviewed 2026-05-22 16:43 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords adversarial trainingcatastrophic overfittingl^p normparticipation ratioentropy gapFGSMgradient concentrationrobustness

0 comments

The pith

Tuning the l^p training norm adaptively using participation ratio and entropy prevents catastrophic overfitting in fast adversarial training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that catastrophic overfitting arises in fast adversarial training when single-step methods like FGSM produce models that resist only weak attacks but collapse against stronger multi-step ones. This happens more often under the l^infty norm than l^2 because concentrated gradients interact badly with strict norm constraints. The authors model generalized l^p attacks as a fixed-point problem, then build an adaptive l^p-FGSM that chooses the norm at each step by tracking how spread out the gradients are via participation ratio and entropy. If the claim holds, it supplies a simple, noiseless way to train robust networks that does not rely on extra regularization, noise injection, or slower multi-step attacks.

Core claim

Catastrophic overfitting emerges when highly concentrated gradients, where information localizes in few dimensions, meet aggressive norm constraints. By quantifying this concentration through participation ratio and entropy gap, the authors construct an adaptive l^p-FGSM that automatically selects the training norm to avoid the failure mode, achieving strong robustness to multi-step attacks without additional techniques.

What carries the argument

The adaptive l^p-FGSM, which treats the generalized l^p attack as a fixed-point problem and selects the norm p at each training step according to the participation ratio and entropy of the current gradients.

If this is right

Single-step adversarial training reaches multi-step robustness levels without noise or regularization.
The choice of l^p norm can be made data-driven rather than fixed in advance.
Gradient concentration measures become practical diagnostics for training stability.
Fast adversarial training becomes viable for larger models where multi-step methods are too slow.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar adaptive-norm logic could stabilize other optimization problems that suffer from gradient sparsity or concentration.
Participation ratio tracking might diagnose related issues in standard supervised training or pruning.
The entropy-gap formulation invites tests on whether the same signals predict robustness in non-adversarial settings.

Load-bearing premise

Catastrophic overfitting is produced specifically by the interaction between highly concentrated gradients and aggressive norm constraints, and that automatically tuning the l^p norm from participation ratio and entropy is sufficient to block it.

What would settle it

An experiment in which the adaptive l^p method is applied but models still exhibit catastrophic overfitting whenever gradient participation ratio remains low, or in which fixed-norm l^2 or l^infty training matches the adaptive method's robustness.

Figures

Figures reproduced from arXiv: 2505.02360 by Fares B. Mehouachi, Saif Eddin Jabari.

**Figure 1.** Figure 1: CO phenomena on CIFAR-10 [29] using WideResNet-28-10 [30]: Upper: l∞ training (ϵ = 8/255) shows accuracy collapse against PGD-50 (ϵ = 8/255) [15] attacks, while l 2 (ϵ = 32/255, both training and attack) remains stable. Lower: CO onset in l∞ training correlates with gradient norm increase, absent in l 2 training (norms normalized at epoch 1). or regularization [25, 26], our method achieves superior perform… view at source ↗

**Figure 2.** Figure 2: Impact of l p norm choice on training dynamics and robustness for CIFAR-10 with WideResNet-28-10. The choice of p reveals a key trade-off: higher values (p ≥ 32) initially show better robustness but become vulnerable to Catastrophic Overfitting (CO), evident in the l∞ PGD-50 plot (second left). Lower p values prevent CO but with reduced adversarial robustness. Notably, l 2 PGD-50 accuracy (rightmost) remai… view at source ↗

**Figure 3.** Figure 3: Depiction of training effect on CIFAR-10’s loss [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 5.** Figure 5: Variation of the l p transition function Υp for different values of p. The high-pass filtering effect mirrors the thresholding behavior in ZeroGrad [33]. Lipschitzness of Fp: For p > 2, global Lipschitz continuity fails due to the discontinuous sign function and concave power term q − 1 at null gradients. However, local Lipschitzness suffices via Banach contraction when gradients are bounded away from ze… view at source ↗

**Figure 4.** Figure 4: Illustration of the initial two ascents of the fixed [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 6.** Figure 6: Clean and adversarial accuracy across datasets [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Evolution of Participation Ratios (PR, PR1) and entropy gap during training. Sharp declines in these metrics align with the onset of Catastrophic Overfitting (CO), highlighting the link between gradient concentration and adversarial vulnerability. Same experimental setting as [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: Effect of the l p norm on attack geometry and sensitivity to gradient noise. Left: An ideal scenario, where the angles between δ2, δ∞, and any δp are zero. Right: Under small gradient noise (common in ML), l∞ shows high sensitivity with large angular separation, whereas l p yields more stable attacks with better gradient alignment (higher cosine similarity). where ∆H = Hm − H is the Entropy Gap, H is the … view at source ↗

**Figure 9.** Figure 9: Performance benchmarking of adaptive l p normbased training against single-step and fast adversarial techniques using PGD-50-10, demonstrating the competitive efficacy of adaptive l p -FGSM. Results were achieved with an SGD optimizer with a cosine learning rate schedule (30 epochs, minimum 0.001, maximum 0.2), weight decay of 5 · 10−4 , and a dropout rate of 0.1. For SVHN and CIFAR10, β = 0.01 was appl… view at source ↗

**Figure 10.** Figure 10: Comparative evaluation using AutoAttack on CIFAR-10 with WideResNet-28-10 across different perturbation [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗

**Figure 11.** Figure 11: Extended training performance of l p -FGSM on CIFAR-10. While Catastrophic Overfitting (CO) was not observed, the experiment highlights the occurrence of robust overfitting over a prolonged training period. The results of this long-term training provide insightful observations. Crucially, no instances of Catastrophic Overfitting (CO) were detected throughout the training process, underscoring the robustne… view at source ↗

**Figure 12.** Figure 12: Analysis of ε-softening and noise effects on CIFAR-10 using WideResNet-28-10 against PGD-50 (ϵ = 8/255). Left: Effect of ε-softening on clean (dashed) and adversarial (solid) accuracy for various p values. Optimal ε enhances stability against CO. Right: Synergistic effects of noise injection showing improved robustness against CO and enhanced overall accuracy. The results demonstrate that both components … view at source ↗

**Figure 13.** Figure 13: Evolution of Participation Ratios ( [PITH_FULL_IMAGE:figures/full_fig_p026_13.png] view at source ↗

read the original abstract

Adversarial training is a cornerstone of robust deep learning, but fast methods like the Fast Gradient Sign Method (FGSM) often suffer from Catastrophic Overfitting (CO), where models become robust to single-step attacks but fail against multi-step variants. While existing solutions rely on noise injection, regularization, or gradient clipping, we propose a novel solution that purely controls the $l^p$ training norm to mitigate CO. Our study is motivated by the empirical observation that CO is more prevalent under the $l^{\infty}$ norm than the $l^2$ norm. Leveraging this insight, we develop a framework for generalized $l^p$ attack as a fixed point problem and craft $l^p$-FGSM attacks to understand the transition mechanics from $l^2$ to $l^{\infty}$. This leads to our core insight: CO emerges when highly concentrated gradients where information localizes in few dimensions interact with aggressive norm constraints. By quantifying gradient concentration through Participation Ratio and entropy measures, we develop an adaptive $l^p$-FGSM that automatically tunes the training norm based on gradient information. Extensive experiments demonstrate that this approach achieves strong robustness without requiring additional regularization or noise injection, providing a novel and theoretically-principled pathway to mitigate the CO problem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Adaptive l^p tuning via participation ratio and entropy gives a clean empirical route around CO but the causal isolation of those metrics is still thin.

read the letter

The headline here is that the authors replace noise or regularization with an adaptive choice of training norm p, picked from the participation ratio and entropy of the gradients, and report that this alone keeps FGSM training from collapsing into catastrophic overfitting. They motivate it by noting CO is worse under l^inf than l^2, then treat the l^p attack as a fixed-point problem to trace the transition and link the failure mode to highly concentrated gradients.

Referee Report

3 major / 2 minor

Summary. The paper claims that catastrophic overfitting (CO) in fast adversarial training with FGSM arises from the interaction of highly concentrated gradients with aggressive norm constraints, and proposes an adaptive l^p-FGSM method that automatically tunes the training norm p using participation ratio and entropy measures of the gradients. It develops a fixed-point framework for generalized l^p attacks to analyze the transition from l^2 to l^∞ norms and reports that this noiseless approach achieves strong robustness without regularization or noise injection.

Significance. If the central claim holds, the work offers a new mechanism-based approach to CO mitigation that avoids common interventions like noise or clipping, potentially simplifying robust training. The quantification of gradient concentration via participation ratio and entropy provides a concrete diagnostic for norm choice, and the fixed-point formulation for l^p attacks is a useful technical contribution if the derivations are rigorous.

major comments (3)

[§3] §3 (fixed-point framework for l^p-FGSM): The core claim that CO emerges specifically from concentrated gradients interacting with norm constraints is load-bearing, yet the derivation does not demonstrate why participation ratio and entropy are necessary and sufficient diagnostics rather than alternatives such as gradient sparsity or Hessian-based measures; an explicit isolation argument or counterexample is needed to support the 'theoretically-principled' assertion.
[Adaptive rule] Adaptive rule and mapping (around Eq. for p selection): The adaptive tuning is presented as automatic and based on gradient information, but the manuscript does not clarify whether the mapping from participation ratio to p introduces fitted constants or hyperparameters that are later used to claim success, raising a circularity concern for the parameter-free interpretation.
[Experiments] Experimental validation (results tables/figures): While extensive experiments are reported, the absence of ablations that fix p while matching the same participation ratio and entropy statistics leaves the causal link between the proposed metrics and CO prevention untested; this undermines the claim that the method alone is sufficient without other interventions.

minor comments (2)

[Preliminaries] Notation for participation ratio and entropy gap should be defined more explicitly with respect to the gradient vector dimensions to avoid ambiguity in replication.
[Figures] Figure captions for the l^p transition plots could include the exact p values used in each panel for clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help clarify the presentation of our contributions. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [§3] §3 (fixed-point framework for l^p-FGSM): The core claim that CO emerges specifically from concentrated gradients interacting with norm constraints is load-bearing, yet the derivation does not demonstrate why participation ratio and entropy are necessary and sufficient diagnostics rather than alternatives such as gradient sparsity or Hessian-based measures; an explicit isolation argument or counterexample is needed to support the 'theoretically-principled' assertion.

Authors: We appreciate the referee's emphasis on strengthening the justification for our choice of diagnostics. The fixed-point framework in §3 derives the conditions under which gradient concentration interacts with the l^∞ constraint to produce CO, and participation ratio together with entropy are selected because they quantify the effective support and information localization of the gradient vector in a manner directly tied to those conditions. We do not claim these are the only possible measures, nor do we provide a full isolation proof against every alternative. In the revision we will expand the discussion in §3 to include a comparison with gradient sparsity and a brief note on why Hessian-based alternatives are less directly connected to the single-step norm transition analyzed in the fixed-point formulation. revision: yes
Referee: [Adaptive rule] Adaptive rule and mapping (around Eq. for p selection): The adaptive tuning is presented as automatic and based on gradient information, but the manuscript does not clarify whether the mapping from participation ratio to p introduces fitted constants or hyperparameters that are later used to claim success, raising a circularity concern for the parameter-free interpretation.

Authors: The mapping is obtained by identifying the participation-ratio thresholds at which the fixed-point analysis predicts the onset of adverse norm-gradient interaction; these thresholds are fixed by the theoretical transition points and do not involve constants fitted to robustness metrics or validation performance. Consequently the rule remains free of data-dependent hyperparameters. We will revise the text around the relevant equation to state this derivation explicitly and to confirm that no post-hoc fitting was performed. revision: yes
Referee: [Experiments] Experimental validation (results tables/figures): While extensive experiments are reported, the absence of ablations that fix p while matching the same participation ratio and entropy statistics leaves the causal link between the proposed metrics and CO prevention untested; this undermines the claim that the method alone is sufficient without other interventions.

Authors: We agree that a controlled ablation holding p fixed while matching the observed participation-ratio and entropy statistics would provide stronger causal evidence. Because these statistics are themselves functions of the chosen training norm, constructing such matched conditions requires additional experimental design. Our current results already compare the adaptive rule against fixed-p baselines and track the evolution of the metrics during training. In the revision we will add supplementary figures that plot participation ratio and entropy trajectories for the fixed-p runs, thereby making the correlation with CO more explicit. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation is empirical and self-contained.

full rationale

The paper starts from the empirical observation that CO occurs more under l^∞ than l², frames the generalized l^p attack as a fixed-point problem, and introduces Participation Ratio plus entropy as quantifiers of gradient concentration to drive an adaptive choice of p. No equation in the provided text defines the chosen p or the concentration metrics in terms of the final robustness metric, nor does any step rename a fitted parameter as a prediction or reduce the claimed sufficiency to a self-citation chain. The central pathway is therefore supported by external experimental validation rather than by construction from its own inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on the empirical observation that CO is worse under l^inf than l^2 and on the modeling choice that gradient concentration measured by participation ratio drives the need for adaptive norm selection. No explicit free parameters are named in the abstract, but the adaptive rule likely introduces at least one tunable threshold or mapping from participation ratio to p.

free parameters (1)

mapping from participation ratio to p
The adaptive rule that selects or tunes the norm order p based on measured gradient concentration is not specified as parameter-free.

axioms (1)

domain assumption CO emerges when highly concentrated gradients interact with aggressive norm constraints
This is presented as the core insight motivating the adaptive l^p-FGSM.

pith-pipeline@v0.9.0 · 5772 in / 1440 out tokens · 73673 ms · 2026-05-22T16:43:36.579775+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CO emerges when highly concentrated gradients—where information localizes in few dimensions—interact with aggressive norm constraints. By quantifying gradient concentration through Participation Ratio and entropy measures, we develop an adaptive lp-FGSM that automatically tunes the training norm based on gradient information.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

PR1 = (||∇xℓ||1 / ||∇xℓ||2)^2 ... cos(θ2,∞) = sqrt(PR1 / d) ... q* ≥ 1 + (τ sqrt(d/PR1) - 1)/ΔH

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

FlowMixer: A Depth-Agnostic Neural Architecture for Interpretable Spatiotemporal Forecasting
cs.LG 2025-05 unverdicted novelty 5.0

A single-layer architecture called FlowMixer uses constrained matrix operations and a semi-group property to enable depth-agnostic, interpretable spatiotemporal forecasting with direct eigenmode extraction.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · cited by 1 Pith paper · 5 internal anchors

[1]

Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups

Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6):82–97, 2012

work page 2012
[2]

Deep learning

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436–444, 2015

work page 2015
[3]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017

work page 2017
[4]

Intriguing properties of neural networks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In arXiv preprint arXiv:1312.6199, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[5]

Explaining and Harnessing Adversarial Examples

Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[6]

Multilabel black-box adversarial attacks only with predicted labels

Linghao Kong, Wenjian Luo, Zipeng Ye, Qi Zhou, and Yan Jia. Multilabel black-box adversarial attacks only with predicted labels. IEEE Transactions on Artificial Intelligence, 6(5):1284–1297, 2025

work page 2025
[7]

Rethinking transferable adversarial attacks with double adversarial neuron attribution

Zhiyu Zhu, Zhibo Jin, Xinyi Wang, Jiayu Zhang, Huaming Chen, and Kim-Kwang Raymond Choo. Rethinking transferable adversarial attacks with double adversarial neuron attribution. IEEE Transactions on Artificial Intelligence, 6(2):354–364, 2025

work page 2025
[8]

Functional safety for machine learning: a case study in automotive software

Léonard Humbert, Michael Wagner, and Philip Koopman. Functional safety for machine learning: a case study in automotive software. In Proceedings of the 35th Annual ACM Symposium on Applied Computing , pages 1739–1746, 2020

work page 2020
[9]

Dynamic risk assessment for autonomous vehicle safety

Michael Wagner and Philip Koopman. Dynamic risk assessment for autonomous vehicle safety. Journal of Systems and Software, 168:110598, 2020

work page 2020
[10]

Detection and identification of uavs based on spectrum monitoring and deep learning in negative snr conditions

F Mehouachi, Juan Galvis, Santiago Morales, Milosch Meriac, Felix Vega, and Chaouki Kasmi. Detection and identification of uavs based on spectrum monitoring and deep learning in negative snr conditions. URSI GASS, 2021

work page 2021
[11]

On the vulnerability of deep reinforcement learning to backdoor attacks in autonomous vehicles

Yue Wang, Esha Sarkar, Saif Eddin Jabari, and Michail Maniatakos. On the vulnerability of deep reinforcement learning to backdoor attacks in autonomous vehicles. In Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing: Use Cases and Emerging Challenges, pages 315–341. Springer, 2023

work page 2023
[12]

Adversarial attacks on medical machine learning

Samuel G Finlayson, John D Bowers, Joichi Ito, Jonathan L Zittrain, Andrew L Beam, and Isaac S Kohane. Adversarial attacks on medical machine learning. Science, 363(6433):1287–1289, 2019

work page 2019
[13]

Adversarial attacks on deep models for financial transaction records

Ivan Fursov, Matvey Morozov, Nina Kaploukhaya, Elizaveta Kovtun, Rodrigo Rivera-Castro, Gleb Gusev, Dmitry Babaev, Ivan Kireev, Alexey Zaytsev, and Evgeny Burnaev. Adversarial attacks on deep models for financial transaction records. arXiv preprint arXiv:2106.08361, 2021

work page arXiv 2021
[14]

Adversarial attacks on machine learning systems for high-frequency trading

Micah Goldblum, Avi Schwarzschild, Ankit B Patel, and Tom Goldstein. Adversarial attacks on machine learning systems for high-frequency trading. arXiv preprint arXiv:2002.09565, 2020

work page arXiv 2002
[15]

Towards Deep Learning Models Resistant to Adversarial Attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[16]

Robustness of classifiers: from adversarial to random noise

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Robustness of classifiers: from adversarial to random noise. Advances in Neural Information Processing Systems, 2018

work page 2018
[17]

Theoretically principled trade-off between robustness and accuracy

Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. In International conference on machine learning, pages 7472–7482. PMLR, 2019

work page 2019
[18]

Rachel Selva Dhanaraj and M. Sridevi. Building a robust and efficient defensive system using hybrid adversarial attack. IEEE Transactions on Artificial Intelligence, 5(9):4470–4478, 2024

work page 2024
[19]

Apr-net: Defense against adversarial examples based on universal adversarial perturbation removal network.IEEE Transactions on Artificial Intelligence, 6(4):945–954, 2025

Wenxing Liao, Zhuxian Liu, Minghuang Shen, Riqing Chen, and Xiaolong Liu. Apr-net: Defense against adversarial examples based on universal adversarial perturbation removal network.IEEE Transactions on Artificial Intelligence, 6(4):945–954, 2025

work page 2025
[20]

Adversarial machine learning for social good: Reframing the adversary as an ally

Shawqi Al-Maliki, Adnan Qayyum, Hassan Ali, Mohamed Abdallah, Junaid Qadir, Dinh Thai Hoang, Dusit Niyato, and Ala Al-Fuqaha. Adversarial machine learning for social good: Reframing the adversary as an ally. IEEE Transactions on Artificial Intelligence, 5(9):4322–4343, 2024

work page 2024
[21]

Adversarial masked autoencoders are robust vision learners

Yuchong Yao, Nandakishor Desai, and Marimuthu Palaniswami. Adversarial masked autoencoders are robust vision learners. IEEE Transactions on Artificial Intelligence, 6(4):805–815, 2025. 10 A Noiseless lp Norm Solution for Fast Adversarial Training

work page 2025
[22]

Active robust adversarial reinforcement learning under temporally coupled perturbations

Jiacheng Yang, Yuanda Wang, Lu Dong, Lei Xue, and Changyin Sun. Active robust adversarial reinforcement learning under temporally coupled perturbations. IEEE Transactions on Artificial Intelligence, 6(4):874–884, 2025

work page 2025
[23]

A membership inference and adversarial attack defense framework for network traffic classifiers.IEEE Transactions on Artificial Intelligence, 6(2):317–332, 2025

Guangrui Liu, Weizhe Zhang, Xurun Wang, Stephen King, and Shui Yu. A membership inference and adversarial attack defense framework for network traffic classifiers.IEEE Transactions on Artificial Intelligence, 6(2):317–332, 2025

work page 2025
[24]

Dale and Lauren Christopher

Ashley S. Dale and Lauren Christopher. Direct adversarial latent estimation to evaluate decision boundary complexity in black box models. IEEE Transactions on Artificial Intelligence, 5(12):6043–6053, 2024

work page 2024
[25]

Fast is better than free: Revisiting adversarial training

Eric Wong, Leslie Rice, and J Zico Kolter. Fast is better than free: Revisiting adversarial training. arXiv preprint arXiv:2001.03994, 2020

work page arXiv 2001
[26]

Understanding and improving fast adversarial training

Maksym Andriushchenko and Nicolas Flammarion. Understanding and improving fast adversarial training. Advances in Neural Information Processing Systems, 33:16048–16059, 2020

work page 2020
[27]

Adversarial robustness through local linearization

Chongli Qin, James Martens, Sven Gowal, Dilip Krishnan, Krishnamurthy Dvijotham, Alhussein Fawzi, Soham De, Robert Stanforth, and Pushmeet Kohli. Adversarial robustness through local linearization. Advances in Neural Information Processing Systems, 32, 2019

work page 2019
[28]

Spatio-temporal graph-based generation and detection of adversarial false data injection evasion attacks in smart grids

Abdulrahman Takiddin, Muhammad Ismail, Rachad Atat, and Erchin Serpedin. Spatio-temporal graph-based generation and detection of adversarial false data injection evasion attacks in smart grids. IEEE Transactions on Artificial Intelligence, 5(12):6601–6616, 2024

work page 2024
[29]

Learning multiple layers of features from tiny images

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. University of Toronto Technical Report, 2009

work page 2009
[30]

Wide Residual Networks

Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[31]

Absence of diffusion in certain random lattices

Philip W Anderson. Absence of diffusion in certain random lattices. Physical review, 109(5):1492, 1958

work page 1958
[32]

III: Quantum Mechanics

Richard P Feynman, Robert B Leighton, and Matthew Sands.The Feynman Lectures on Physics, Vol. III: Quantum Mechanics. Addison-Wesley, 1965

work page 1965
[33]

Zerograd: Mitigating and explaining catastrophic overfitting in fgsm adversarial training

Zeinab Golgooni, Mehrdad Saberi, Masih Eskandar, and Mohammad Hossein Rohban. Zerograd: Mitigating and explaining catastrophic overfitting in fgsm adversarial training. arXiv preprint arXiv:2103.15476, 2021

work page arXiv 2021
[34]

Make some noise: Reliable and efficient single-step adversarial training.Advances in Neural Information Processing Systems, 35:12881–12893, 2022

Pau de Jorge Aranda, Adel Bibi, Riccardo V olpi, Amartya Sanyal, Philip Torr, Grégory Rogez, and Puneet Dokania. Make some noise: Reliable and efficient single-step adversarial training.Advances in Neural Information Processing Systems, 35:12881–12893, 2022

work page 2022
[35]

Deep Learning

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016

work page 2016
[36]

The nature of statistical learning theory

Vladimir Vapnik. The nature of statistical learning theory. Springer science & business media, 1999

work page 1999
[37]

Self-normalizing neural networks

Günter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. Self-normalizing neural networks. In Advances in Neural Information Processing Systems, pages 971–980, 2017

work page 2017
[38]

Gaussian Error Linear Units (GELUs)

Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[39]

Efficient training of low-curvature neural networks

Suraj Srinivas, Kyle Matoba, Himabindu Lakkaraju, and François Fleuret. Efficient training of low-curvature neural networks. Advances in Neural Information Processing Systems, 35:25951–25964, 2022

work page 2022
[40]

Reading digits in natural images with unsupervised feature learning

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. 2011

work page 2011
[41]

Identity mappings in deep residual networks

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pages 630–645. Springer, 2016

work page 2016
[42]

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

Francesco Croce and Matthias Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International conference on machine learning, pages 2206–2216. PMLR, 2020

work page 2020
[43]

Pac-bayesian spectrally-normalized bounds for adversarially robust generalization

Jiancong Xiao, Ruoyu Sun, and Zhi-Quan Luo. Pac-bayesian spectrally-normalized bounds for adversarially robust generalization. Advances in Neural Information Processing Systems, 36:36305–36323, 2023

work page 2023
[44]

Berg, and Li Fei-Fei

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015

work page 2015
[45]

Understanding catastrophic overfitting in single-step adversarial training

Hoki Kim, Woojin Lee, and Jaewook Lee. Understanding catastrophic overfitting in single-step adversarial training. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 8119–8127, 2021. 11 A Noiseless lp Norm Solution for Fast Adversarial Training

work page 2021
[46]

Adversarial training for free!Advances in Neural Information Processing Systems, 32, 2019

Ali Shafahi, Mahyar Najibi, Mohammad Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S Davis, Gavin Taylor, and Tom Goldstein. Adversarial training for free!Advances in Neural Information Processing Systems, 32, 2019

work page 2019
[47]

one power

Leslie Rice, Eric Wong, and Zico Kolter. Overfitting in adversarially robust deep learning. In International Conference on Machine Learning, pages 8093–8104. PMLR, 2020. Acknowledgment This work was supported in part by the NYUAD Center for Interacting Urban Networks (CITIES), funded by Tamkeen under the NYUAD Research Institute Award CG001, and in part b...

work page 2020

[1] [1]

Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups

Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6):82–97, 2012

work page 2012

[2] [2]

Deep learning

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436–444, 2015

work page 2015

[3] [3]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017

work page 2017

[4] [4]

Intriguing properties of neural networks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In arXiv preprint arXiv:1312.6199, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[5] [5]

Explaining and Harnessing Adversarial Examples

Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[6] [6]

Multilabel black-box adversarial attacks only with predicted labels

Linghao Kong, Wenjian Luo, Zipeng Ye, Qi Zhou, and Yan Jia. Multilabel black-box adversarial attacks only with predicted labels. IEEE Transactions on Artificial Intelligence, 6(5):1284–1297, 2025

work page 2025

[7] [7]

Rethinking transferable adversarial attacks with double adversarial neuron attribution

Zhiyu Zhu, Zhibo Jin, Xinyi Wang, Jiayu Zhang, Huaming Chen, and Kim-Kwang Raymond Choo. Rethinking transferable adversarial attacks with double adversarial neuron attribution. IEEE Transactions on Artificial Intelligence, 6(2):354–364, 2025

work page 2025

[8] [8]

Functional safety for machine learning: a case study in automotive software

Léonard Humbert, Michael Wagner, and Philip Koopman. Functional safety for machine learning: a case study in automotive software. In Proceedings of the 35th Annual ACM Symposium on Applied Computing , pages 1739–1746, 2020

work page 2020

[9] [9]

Dynamic risk assessment for autonomous vehicle safety

Michael Wagner and Philip Koopman. Dynamic risk assessment for autonomous vehicle safety. Journal of Systems and Software, 168:110598, 2020

work page 2020

[10] [10]

Detection and identification of uavs based on spectrum monitoring and deep learning in negative snr conditions

F Mehouachi, Juan Galvis, Santiago Morales, Milosch Meriac, Felix Vega, and Chaouki Kasmi. Detection and identification of uavs based on spectrum monitoring and deep learning in negative snr conditions. URSI GASS, 2021

work page 2021

[11] [11]

On the vulnerability of deep reinforcement learning to backdoor attacks in autonomous vehicles

Yue Wang, Esha Sarkar, Saif Eddin Jabari, and Michail Maniatakos. On the vulnerability of deep reinforcement learning to backdoor attacks in autonomous vehicles. In Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing: Use Cases and Emerging Challenges, pages 315–341. Springer, 2023

work page 2023

[12] [12]

Adversarial attacks on medical machine learning

Samuel G Finlayson, John D Bowers, Joichi Ito, Jonathan L Zittrain, Andrew L Beam, and Isaac S Kohane. Adversarial attacks on medical machine learning. Science, 363(6433):1287–1289, 2019

work page 2019

[13] [13]

Adversarial attacks on deep models for financial transaction records

Ivan Fursov, Matvey Morozov, Nina Kaploukhaya, Elizaveta Kovtun, Rodrigo Rivera-Castro, Gleb Gusev, Dmitry Babaev, Ivan Kireev, Alexey Zaytsev, and Evgeny Burnaev. Adversarial attacks on deep models for financial transaction records. arXiv preprint arXiv:2106.08361, 2021

work page arXiv 2021

[14] [14]

Adversarial attacks on machine learning systems for high-frequency trading

Micah Goldblum, Avi Schwarzschild, Ankit B Patel, and Tom Goldstein. Adversarial attacks on machine learning systems for high-frequency trading. arXiv preprint arXiv:2002.09565, 2020

work page arXiv 2002

[15] [15]

Towards Deep Learning Models Resistant to Adversarial Attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[16] [16]

Robustness of classifiers: from adversarial to random noise

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Robustness of classifiers: from adversarial to random noise. Advances in Neural Information Processing Systems, 2018

work page 2018

[17] [17]

Theoretically principled trade-off between robustness and accuracy

Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. In International conference on machine learning, pages 7472–7482. PMLR, 2019

work page 2019

[18] [18]

Rachel Selva Dhanaraj and M. Sridevi. Building a robust and efficient defensive system using hybrid adversarial attack. IEEE Transactions on Artificial Intelligence, 5(9):4470–4478, 2024

work page 2024

[19] [19]

Apr-net: Defense against adversarial examples based on universal adversarial perturbation removal network.IEEE Transactions on Artificial Intelligence, 6(4):945–954, 2025

Wenxing Liao, Zhuxian Liu, Minghuang Shen, Riqing Chen, and Xiaolong Liu. Apr-net: Defense against adversarial examples based on universal adversarial perturbation removal network.IEEE Transactions on Artificial Intelligence, 6(4):945–954, 2025

work page 2025

[20] [20]

Adversarial machine learning for social good: Reframing the adversary as an ally

Shawqi Al-Maliki, Adnan Qayyum, Hassan Ali, Mohamed Abdallah, Junaid Qadir, Dinh Thai Hoang, Dusit Niyato, and Ala Al-Fuqaha. Adversarial machine learning for social good: Reframing the adversary as an ally. IEEE Transactions on Artificial Intelligence, 5(9):4322–4343, 2024

work page 2024

[21] [21]

Adversarial masked autoencoders are robust vision learners

Yuchong Yao, Nandakishor Desai, and Marimuthu Palaniswami. Adversarial masked autoencoders are robust vision learners. IEEE Transactions on Artificial Intelligence, 6(4):805–815, 2025. 10 A Noiseless lp Norm Solution for Fast Adversarial Training

work page 2025

[22] [22]

Active robust adversarial reinforcement learning under temporally coupled perturbations

Jiacheng Yang, Yuanda Wang, Lu Dong, Lei Xue, and Changyin Sun. Active robust adversarial reinforcement learning under temporally coupled perturbations. IEEE Transactions on Artificial Intelligence, 6(4):874–884, 2025

work page 2025

[23] [23]

A membership inference and adversarial attack defense framework for network traffic classifiers.IEEE Transactions on Artificial Intelligence, 6(2):317–332, 2025

Guangrui Liu, Weizhe Zhang, Xurun Wang, Stephen King, and Shui Yu. A membership inference and adversarial attack defense framework for network traffic classifiers.IEEE Transactions on Artificial Intelligence, 6(2):317–332, 2025

work page 2025

[24] [24]

Dale and Lauren Christopher

Ashley S. Dale and Lauren Christopher. Direct adversarial latent estimation to evaluate decision boundary complexity in black box models. IEEE Transactions on Artificial Intelligence, 5(12):6043–6053, 2024

work page 2024

[25] [25]

Fast is better than free: Revisiting adversarial training

Eric Wong, Leslie Rice, and J Zico Kolter. Fast is better than free: Revisiting adversarial training. arXiv preprint arXiv:2001.03994, 2020

work page arXiv 2001

[26] [26]

Understanding and improving fast adversarial training

Maksym Andriushchenko and Nicolas Flammarion. Understanding and improving fast adversarial training. Advances in Neural Information Processing Systems, 33:16048–16059, 2020

work page 2020

[27] [27]

Adversarial robustness through local linearization

Chongli Qin, James Martens, Sven Gowal, Dilip Krishnan, Krishnamurthy Dvijotham, Alhussein Fawzi, Soham De, Robert Stanforth, and Pushmeet Kohli. Adversarial robustness through local linearization. Advances in Neural Information Processing Systems, 32, 2019

work page 2019

[28] [28]

Spatio-temporal graph-based generation and detection of adversarial false data injection evasion attacks in smart grids

Abdulrahman Takiddin, Muhammad Ismail, Rachad Atat, and Erchin Serpedin. Spatio-temporal graph-based generation and detection of adversarial false data injection evasion attacks in smart grids. IEEE Transactions on Artificial Intelligence, 5(12):6601–6616, 2024

work page 2024

[29] [29]

Learning multiple layers of features from tiny images

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. University of Toronto Technical Report, 2009

work page 2009

[30] [30]

Wide Residual Networks

Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[31] [31]

Absence of diffusion in certain random lattices

Philip W Anderson. Absence of diffusion in certain random lattices. Physical review, 109(5):1492, 1958

work page 1958

[32] [32]

III: Quantum Mechanics

Richard P Feynman, Robert B Leighton, and Matthew Sands.The Feynman Lectures on Physics, Vol. III: Quantum Mechanics. Addison-Wesley, 1965

work page 1965

[33] [33]

Zerograd: Mitigating and explaining catastrophic overfitting in fgsm adversarial training

Zeinab Golgooni, Mehrdad Saberi, Masih Eskandar, and Mohammad Hossein Rohban. Zerograd: Mitigating and explaining catastrophic overfitting in fgsm adversarial training. arXiv preprint arXiv:2103.15476, 2021

work page arXiv 2021

[34] [34]

Make some noise: Reliable and efficient single-step adversarial training.Advances in Neural Information Processing Systems, 35:12881–12893, 2022

Pau de Jorge Aranda, Adel Bibi, Riccardo V olpi, Amartya Sanyal, Philip Torr, Grégory Rogez, and Puneet Dokania. Make some noise: Reliable and efficient single-step adversarial training.Advances in Neural Information Processing Systems, 35:12881–12893, 2022

work page 2022

[35] [35]

Deep Learning

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016

work page 2016

[36] [36]

The nature of statistical learning theory

Vladimir Vapnik. The nature of statistical learning theory. Springer science & business media, 1999

work page 1999

[37] [37]

Self-normalizing neural networks

Günter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. Self-normalizing neural networks. In Advances in Neural Information Processing Systems, pages 971–980, 2017

work page 2017

[38] [38]

Gaussian Error Linear Units (GELUs)

Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[39] [39]

Efficient training of low-curvature neural networks

Suraj Srinivas, Kyle Matoba, Himabindu Lakkaraju, and François Fleuret. Efficient training of low-curvature neural networks. Advances in Neural Information Processing Systems, 35:25951–25964, 2022

work page 2022

[40] [40]

Reading digits in natural images with unsupervised feature learning

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. 2011

work page 2011

[41] [41]

Identity mappings in deep residual networks

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pages 630–645. Springer, 2016

work page 2016

[42] [42]

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

Francesco Croce and Matthias Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International conference on machine learning, pages 2206–2216. PMLR, 2020

work page 2020

[43] [43]

Pac-bayesian spectrally-normalized bounds for adversarially robust generalization

Jiancong Xiao, Ruoyu Sun, and Zhi-Quan Luo. Pac-bayesian spectrally-normalized bounds for adversarially robust generalization. Advances in Neural Information Processing Systems, 36:36305–36323, 2023

work page 2023

[44] [44]

Berg, and Li Fei-Fei

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015

work page 2015

[45] [45]

Understanding catastrophic overfitting in single-step adversarial training

Hoki Kim, Woojin Lee, and Jaewook Lee. Understanding catastrophic overfitting in single-step adversarial training. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 8119–8127, 2021. 11 A Noiseless lp Norm Solution for Fast Adversarial Training

work page 2021

[46] [46]

Adversarial training for free!Advances in Neural Information Processing Systems, 32, 2019

Ali Shafahi, Mahyar Najibi, Mohammad Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S Davis, Gavin Taylor, and Tom Goldstein. Adversarial training for free!Advances in Neural Information Processing Systems, 32, 2019

work page 2019

[47] [47]

one power

Leslie Rice, Eric Wong, and Zico Kolter. Overfitting in adversarially robust deep learning. In International Conference on Machine Learning, pages 8093–8104. PMLR, 2020. Acknowledgment This work was supported in part by the NYUAD Center for Interacting Urban Networks (CITIES), funded by Tamkeen under the NYUAD Research Institute Award CG001, and in part b...

work page 2020