Baseline defenses including perplexity-based detection, input preprocessing, and adversarial training offer partial robustness to text adversarial attacks on LLMs, with challenges arising from weak discrete optimizers.
Adversarial Examples Are Not Easily Detected
7 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
A reproducible pipeline produces physical adversarial traffic signs that successfully attack production-grade traffic sign recognition systems in a real car under black-box conditions.
Tree-based cybersecurity classifiers exhibit decoupled prediction robustness and SHAP explanation stability under black-box attacks, quantified by a new ESI metric alongside the Robustness Index.
Connects Lyapunov control theory to a provable defense against weaker adversarial attacks on neural networks.
LGC performs curvature-aware geometric search in a compressed semantic manifold for decision-based attacks, using residual adversarial generation to reach SSIM >0.99 and LPIPS <0.01 at 5000 queries while attacking robust models.
Longitudinal evaluation over yearly Android app slices shows temporal drift reduces adversarial robustness of malware detectors, with expanding-window retraining providing partial mitigation but not full recovery.
Cybersecurity's scale, adversaries, labeling issues, and operational demands make it the superior test-case for general AI progress over NLP or computer vision.
citing papers explorer
-
Fooling a Real Car with Adversarial Traffic Signs
A reproducible pipeline produces physical adversarial traffic signs that successfully attack production-grade traffic sign recognition systems in a real car under black-box conditions.
-
Beyond Gradient-Based Attacks: Adversarial Robustness and Explainability Stability in Cybersecurity Classifiers
Tree-based cybersecurity classifiers exhibit decoupled prediction robustness and SHAP explanation stability under black-box attacks, quantified by a new ESI metric alongside the Robustness Index.
-
Connecting Lyapunov Control Theory to Adversarial Attacks
Connects Lyapunov control theory to a provable defense against weaker adversarial attacks on neural networks.
-
Latent Geometric Chords for Query-Efficient Decision-Based Adversarial Attacks
LGC performs curvature-aware geometric search in a compressed semantic manifold for decision-based attacks, using residual adversarial generation to reach SSIM >0.99 and LPIPS <0.01 at 5000 queries while attacking robust models.
-
Adversarial Vulnerability Under Temporal Concept Drift: A Longitudinal Study of Android Malware Detection
Longitudinal evaluation over yearly Android app slices shows temporal drift reduces adversarial robustness of malware detectors, with expanding-window retraining providing partial mitigation but not full recovery.
-
Cybersecurity is the True Frontier for Generative AI Success or Failure
Cybersecurity's scale, adversaries, labeling issues, and operational demands make it the superior test-case for general AI progress over NLP or computer vision.