Layer-Aware Analysis of Catastrophic Overfitting: Revealing the Pseudo-Robust Shortcut Dependency

Bo Han; Chaojian Yu; Hang Su; Runqi Lin; Tongliang Liu

arxiv: 2405.16262 · v2 · pith:7D4TG4K5new · submitted 2024-05-25 · 💻 cs.LG

Layer-Aware Analysis of Catastrophic Overfitting: Revealing the Pseudo-Robust Shortcut Dependency

Runqi Lin , Chaojian Yu , Bo Han , Hang Su , Tongliang Liu This is my paper

classification 💻 cs.LG

keywords layersadversarialpseudo-robustshortcutsanalysisattacksboundariescatastrophic

0 comments

read the original abstract

Catastrophic overfitting (CO) presents a significant challenge in single-step adversarial training (AT), manifesting as highly distorted deep neural networks (DNNs) that are vulnerable to multi-step adversarial attacks. However, the underlying factors that lead to the distortion of decision boundaries remain unclear. In this work, we delve into the specific changes within different DNN layers and discover that during CO, the former layers are more susceptible, experiencing earlier and greater distortion, while the latter layers show relative insensitivity. Our analysis further reveals that this increased sensitivity in former layers stems from the formation of pseudo-robust shortcuts, which alone can impeccably defend against single-step adversarial attacks but bypass genuine-robust learning, resulting in distorted decision boundaries. Eliminating these shortcuts can partially restore robustness in DNNs from the CO state, thereby verifying that dependence on them triggers the occurrence of CO. This understanding motivates us to implement adaptive weight perturbations across different layers to hinder the generation of pseudo-robust shortcuts, consequently mitigating CO. Extensive experiments demonstrate that our proposed method, Layer-Aware Adversarial Weight Perturbation (LAP), can effectively prevent CO and further enhance robustness.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SORA: Free Second-Order Attacks in Fast Adversarial Training
cs.LG 2026-05 unverdicted novelty 5.0

SORA is an adaptive step-size adversarial training algorithm that formalizes epsilon overfitting, introduces the PertAlign metric to predict catastrophic overfitting, and dynamically adjusts perturbations to achieve s...