super hub Mixed citations

Towards Deep Learning Models Resistant to Adversarial Attacks

Adrian Vladu, Aleksandar Makelov, Aleksander Madry, Dimitris Tsipras, Ludwig Schmidt · 2017 · stat.ML · arXiv 1706.06083

Mixed citation behavior. Most common role is background (67%).

121 Pith papers citing it

Background 67% of classified citations

open full Pith review browse 121 citing papers more from Adrian Vladu arXiv PDF

abstract

Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples---inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models. Code and pre-trained models are available at https://github.com/MadryLab/mnist_challenge and https://github.com/MadryLab/cifar10_challenge.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 21 method 6

citation-polarity summary

background 18 use method 6 unclear 3

claims ledger

abstract Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples---inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us t

authors

Adrian Vladu Aleksandar Makelov Aleksander Madry Dimitris Tsipras Ludwig Schmidt

co-cited works

representative citing papers

On the Generation and Mitigation of Harmful Geometry in Image-to-3D Models

cs.CR · 2026-05-10 · conditional · novelty 8.0

Image-to-3D models successfully generate harmful geometries in most cases with under 0.3% caught by commercial filters; existing safeguards are weak but a stacked defense cuts harmful outputs to under 1% at 11% false-positive cost.

Local LMO: Constrained Gradient Optimization via a Local Linear Minimization Oracle

math.OC · 2026-05-09 · unverdicted · novelty 8.0

Local LMO is a new projection-free method that achieves the convergence rates of projected gradient descent for constrained optimization by using local linear minimization oracles over small balls.

Fortifying Time Series: DTW-Certified Robust Anomaly Detection

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

First DTW-certified robust anomaly detection for time series via randomized smoothing adapted through an l_p-to-DTW lower-bound transformation.

Uncovering and Understanding FPR Manipulation Attack in Industrial IoT Networks

cs.CR · 2026-01-20 · unverdicted · novelty 8.0

FPR manipulation attack perturbs benign MQTT packets to flip labels to attacks in NIDS with 80-100% success, increasing SOC delays without gradient-based methods.

When Interpretability Becomes a Liability: Adversarial Attacks on CBM Concept Layers

cs.LG · 2026-05-25 · unverdicted · novelty 7.0

Concept-level adversarial attacks exploit CBM interpretability on the CUB dataset, but SPECTRA raises required perturbation norm from 0.46 to over 4200 while keeping accuracy loss under 2.2%.

Codec-Robust Attacks on Audio LLMs

cs.SD · 2026-05-19 · unverdicted · novelty 7.0 · 2 refs

CodecAttack perturbs audio in codec latent space with multi-bitrate EoT to achieve 85.5% average ASR on Opus-compressed Audio LLMs versus under 26% for waveform baselines, with transfer to MP3 and AAC.

Understanding Dynamics of Adam in Zero-Sum Games: An ODE Approach

cs.LG · 2026-05-19 · unverdicted · novelty 7.0

Derives ODE limits of Adam-DA showing that first- and second-order momentum parameters reverse their convergence roles in zero-sum games compared to minimization, validated on GAN experiments.

Stress-Testing Neural Network Verifiers with Provably Robust Instances

cs.LG · 2026-05-16 · conditional · novelty 7.0

A reusable framework generates verification instances with provably known robustness labels, revealing numeric tolerance issues and bugs in five verifiers while introducing difficulty profiles to diagnose failure modes.

AIM: Adversarial Information Masking for Faithfulness Evaluation of Saliency Maps

cs.LG · 2026-05-16 · unverdicted · novelty 7.0

AIM is a new saliency-guided adversarial feature replacement method to evaluate faithfulness of saliency maps and reliability of masking operators on image, audio, and EEG tasks.

AuraMask: An Extensible Pipeline for Developing Aesthetic Anti-Facial Recognition Image Filters

cs.CV · 2026-05-13 · conditional · novelty 7.0

AuraMask produces 40 aesthetic anti-facial recognition filters that match or exceed prior adversarial effectiveness and achieve significantly higher user acceptance in a 630-person study.

GaitProtector: Impersonation-Driven Gait De-Identification via Training-Free Diffusion Latent Optimization

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

GaitProtector optimizes diffusion model latents to impersonate target identities in gait sequences, dropping Rank-1 identification accuracy from 89.6% to 15.0% on CASIA-B while keeping scoliosis diagnostic accuracy at 74.2%.

Fix the Loss, Not the Radius: Rethinking the Adversarial Perturbation of Sharpness-Aware Minimization

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

LE-SAM inverts SAM by fixing the loss budget instead of the parameter-space radius, yielding better generalization across benchmarks.

Inference Time Causal Probing in LLMs

cs.AI · 2026-05-08 · unverdicted · novelty 7.0

HDMI is a new probe-free technique that steers LLM hidden states via margin objectives to achieve more reliable causal interventions than prior probe-based methods on standard benchmarks.

Minimum Specification Perturbation: Robustness as Distance-to-Falsification in Causal Inference

stat.ME · 2026-05-02 · unverdicted · novelty 7.0

MSP quantifies the minimum changes to analyst choices required to falsify a causal claim by making its confidence interval contain zero, providing information orthogonal to dispersion-based robustness summaries.

Quantum Interval Bound Propagation for Certified Training of Quantum Neural Networks

quant-ph · 2026-05-01 · unverdicted · novelty 7.0

QIBP adapts interval bound propagation to quantum neural networks for certified adversarial robustness via interval and affine arithmetic implementations.

Low Rank Adaptation for Adversarial Perturbation

cs.LG · 2026-04-30 · unverdicted · novelty 7.0

Adversarial perturbations possess an inherently low-rank structure that enables more efficient and effective black-box adversarial attacks via subspace projection.

A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

cs.CR · 2026-04-25 · unverdicted · novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.

On the Stability and Generalization of First-order Bilevel Minimax Optimization

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

Provides the first systematic generalization analysis via algorithmic stability for single-timescale and two-timescale stochastic gradient descent-ascent in bilevel minimax problems.

Benign Overfitting in Adversarial Training for Vision Transformers

cs.LG · 2026-04-21 · unverdicted · novelty 7.0

Adversarial training on simplified Vision Transformers achieves benign overfitting with near-zero robust loss and generalization error when signal-to-noise ratio and perturbation budget meet specific conditions.

Physically-Induced Atmospheric Adversarial Perturbations: Enhancing Transferability and Robustness in Remote Sensing Image Classification

cs.CV · 2026-04-16 · unverdicted · novelty 7.0

FogFool creates fog-based adversarial perturbations using Perlin noise optimization to achieve high black-box transferability (83.74% TASR) and robustness to defenses in remote sensing classification.

Learning Robustness at Test-Time from a Non-Robust Teacher

cs.CV · 2026-04-13 · unverdicted · novelty 7.0

A test-time adaptation framework anchors adversarial training to a non-robust teacher's predictions, yielding more stable optimization and better robustness-accuracy trade-offs than standard self-consistency methods.

STRONG-VLA: Decoupled Robustness Learning for Vision-Language-Action Models under Multimodal Perturbations

cs.RO · 2026-04-11 · unverdicted · novelty 7.0

STRONG-VLA uses decoupled two-stage training to improve VLA model robustness, yielding up to 16% higher task success rates under seen and unseen perturbations on the LIBERO benchmark.

Can Drift-Adaptive Malware Detectors Be Made Robust? Attacks and Defenses Under White-Box and Black-Box Threats

cs.CR · 2026-04-08 · unverdicted · novelty 7.0

A fine-tuning framework reduces PGD attack success on AdvDA detectors from 100% to 3.2% and MalGuise from 13% to 5.1%, but optimal training strategies differ by threat model and robustness does not transfer across them.

Hidden Reliability Risks in Large Language Models: Systematic Identification of Precision-Induced Output Disagreements

cs.AI · 2026-04-02 · unverdicted · novelty 7.0

PrecisionDiff is a differential testing framework that uncovers widespread precision-induced behavioral disagreements in aligned LLMs, including safety-critical jailbreak divergences across precision formats.

citing papers explorer

Showing 21 of 121 citing papers.

Latent Adversarial Defence with Boundary-guided Generation cs.LG · 2019-07-16 · unverdicted · none · ref 14 · internal anchor
LAD generates diverse adversarial examples in latent space by perturbing along normals to an SVM-defined decision boundary and uses them for adversarial training to improve DNN robustness.
Affine Disentangled GAN for Interpretable and Robust AV Perception cs.CV · 2019-07-06 · unverdicted · none · ref 19 · internal anchor
ADIS-GAN disentangles affine transformations in a GAN to achieve over 98% classification accuracy on MNIST within 30 degrees rotation and over 90% under FGSM and PGD attacks while generating rotation and scaling factors.
When AI Meets Wall Street: A Survey on Trustworthy AI in Fintech cs.CR · 2026-05-28 · unverdicted · none · ref 74 · internal anchor
A survey that proposes a lifecycle-centric framework and the Financial AI Security and Robustness Taxonomy to organize 17 attack subtypes on AI pipelines in finance.
Symmetry Defeats Auditing cs.CR · 2026-05-27 · unverdicted · none · ref 8 · internal anchor
Symmetry enables an attack that defeats introspection adapters for auditing AI systems.
Trustworthy Agent Network: Trust in Agent Networks Must Be Baked In, Not Bolted On cs.AI · 2026-05-18 · unverdicted · none · ref 38 · internal anchor
Argues that trustworthiness in Agent-to-Agent networks requires a new conceptual framework with four design pillars baked in from the beginning, as retrofitting existing single-agent methods is insufficient.
A No-Defense Defense Against Gradient-Based Adversarial Attacks on ML-NIDS: Is Less More? cs.LG · 2026-05-18 · unverdicted · none · ref 3 · internal anchor
Experiments with around 2200 variations show that shallower networks with reduced features and ReLU activation reduce adversarial vulnerability in ML-NIDS and outperform deeper adversarially trained models while keeping high clean-data performance.
Real-Time Evaluation of Autonomous Systems under Adversarial Attacks cs.AI · 2026-05-05 · unverdicted · none · ref 5 · internal anchor
A framework trains and compares MLP, transformer, and GAIL-based trajectory models on real driving data, finding that architectural differences cause large variations in robustness to PGD attacks despite similar nominal accuracy.
Beyond Attack Success Rate: A Multi-Metric Evaluation of Adversarial Transferability in Medical Imaging Models cs.CV · 2026-04-16 · unverdicted · none · ref 23 · internal anchor
Perceptual quality metrics correlate strongly with each other but show minimal correlation with attack success rate across medical imaging models and datasets, making ASR alone inadequate for assessing adversarial robustness.
Adversarial Robustness Analysis of Cloud-Assisted Autonomous Driving Systems cs.RO · 2026-04-06 · unverdicted · none · ref 13 · internal anchor
Adversarial attacks on cloud perception models plus network impairments in a vehicle-cloud loop degrade object detection from 0.73/0.68 to 0.22/0.15 precision/recall and destabilize closed-loop vehicle control.
The Luna Bound Propagator for Formal Analysis of Neural Networks cs.LG · 2026-03-25 · conditional · none · ref 11 · internal anchor
Luna delivers a C++ bound propagator supporting interval, DeepPoly/CROWN, and alpha-CROWN analyses that reports tighter bounds and higher speed than the leading Python alpha-CROWN implementation on VNN-COMP 2025 benchmarks.
Using Intuition from Empirical Properties to Simplify Adversarial Training Defense cs.LG · 2019-06-27 · unverdicted · none · ref 9 · internal anchor
Modifications to single-step adversarial training based on empirical properties of iterative methods improve accuracy by up to 16.93% against iterative attacks while reducing training cost by 28.75%.
Machine Learning Approaches for Improved Scalability of Metallic Magnetic Calorimeters physics.ins-det · 2026-06-23 · unverdicted · none · ref 78 · internal anchor
Machine learning methods are explored for pulse classification, artifact rejection, and shape analysis in metallic magnetic calorimeters to improve scalability over traditional signal processing.
Bridging Control with Neural Network Verifier alpha-beta-CROWN: A Tutorial eess.SY · 2026-05-26 · unverdicted · none · ref 96 · internal anchor
Tutorial introducing applications of the existing α,β-CROWN verifier to scalable formal verification of neural network controllers via bound computation and domain partitioning.
Enabling Adversarial Robustness in AI Models through Kubeflow MLOps cs.CR · 2026-05-14 · unverdicted · none · ref 23 · internal anchor
A Kubeflow-based MLOps architecture detects FGSM adversarial attacks on deployed AI models and automatically applies PGD-based adversarial training to recover accuracy.
SoK: A Comprehensive Analysis of the Current Status of Neural Tangent Generalization Attacks with Research Directions cs.LG · 2026-05-12 · accept · none · ref 63 · internal anchor
NTGA is the first clean-label generalization attack under black-box settings but is vulnerable to adversarial training and image transformations, with newer attacks outperforming it.
Enhancing Adversarial Robustness in Network Intrusion Detection: A Layer-wise Adaptive Regularization Approach cs.CR · 2026-05-09 · unverdicted · none · ref 18 · internal anchor
LARAR enhances adversarial robustness in network intrusion detection by using layer-wise adaptive regularization and auxiliary classifiers, achieving 95.01% clean accuracy and improved defense against FGSM, PGD, and transfer attacks on UNSW-NB15.
Quantum Adversarial Machine Learning: From Classical Adaptations to Quantum-Native Methods cs.LG · 2026-05-12 · unverdicted · none · ref 62 · internal anchor
A survey of quantum adversarial machine learning covering attacks, countermeasures, theoretical underpinnings, trends, and challenges.
Right Predictions, Misleading Explanations: On the Vulnerability of Vision-Language Model Explanations cs.CV · 2026-05-15 · unreviewed · ref 7 · internal anchor
Margin-Adaptive Confidence Ranking for Reliable LLM Judgement cs.LG · 2026-05-14 · unreviewed · ref 60 · internal anchor
REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations cs.CL · 2026-05-12 · unreviewed · ref 155 · internal anchor
Adversarial Robustness in One-Stage Learning-to-Defer stat.ML · 2025-10-13 · unreviewed · ref 10 · internal anchor

Towards Deep Learning Models Resistant to Adversarial Attacks

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer