Baseline defenses including perplexity-based detection, input preprocessing, and adversarial training offer partial robustness to text adversarial attacks on LLMs, with challenges arising from weak discrete optimizers.
Adversarial Examples Are Not Easily Detected
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
A reproducible pipeline produces physical adversarial traffic signs that successfully attack production-grade traffic sign recognition systems in a real car under black-box conditions.
Connects Lyapunov control theory to a provable defense against weaker adversarial attacks on neural networks.
Longitudinal evaluation over yearly Android app slices shows temporal drift reduces adversarial robustness of malware detectors, with expanding-window retraining providing partial mitigation but not full recovery.
citing papers explorer
-
Baseline Defenses for Adversarial Attacks Against Aligned Language Models
Baseline defenses including perplexity-based detection, input preprocessing, and adversarial training offer partial robustness to text adversarial attacks on LLMs, with challenges arising from weak discrete optimizers.
-
Fooling a Real Car with Adversarial Traffic Signs
A reproducible pipeline produces physical adversarial traffic signs that successfully attack production-grade traffic sign recognition systems in a real car under black-box conditions.
-
Connecting Lyapunov Control Theory to Adversarial Attacks
Connects Lyapunov control theory to a provable defense against weaker adversarial attacks on neural networks.
-
Adversarial Vulnerability Under Temporal Concept Drift: A Longitudinal Study of Android Malware Detection
Longitudinal evaluation over yearly Android app slices shows temporal drift reduces adversarial robustness of malware detectors, with expanding-window retraining providing partial mitigation but not full recovery.