hub

Delving into transferable adversarial examples and black-box attacks

Yanpei Liu, Xinyun Chen, Chang Liu, Dawn Song · 2016 · cs.LG · arXiv 1611.02770

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

open full Pith review browse 16 citing papers arXiv PDF

abstract

An intriguing property of deep neural networks is the existence of adversarial examples, which can transfer among different architectures. These transferable adversarial examples may severely hinder deep neural network-based applications. Previous works mostly study the transferability using small scale datasets. In this work, we are the first to conduct an extensive study of the transferability over large models and a large scale dataset, and we are also the first to study the transferability of targeted adversarial examples with their target labels. We study both non-targeted and targeted adversarial examples, and show that while transferable non-targeted adversarial examples are easy to find, targeted adversarial examples generated using existing approaches almost never transfer with their target labels. Therefore, we propose novel ensemble-based approaches to generating transferable adversarial examples. Using such approaches, we observe a large proportion of targeted adversarial examples that are able to transfer with their target labels for the first time. We also present some geometric studies to help understanding the transferable adversarial examples. Finally, we show that the adversarial examples generated using ensemble-based approaches can successfully attack Clarifai.com, which is a black-box image classification system.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

Uncovering and Understanding FPR Manipulation Attack in Industrial IoT Networks

cs.CR · 2026-01-20 · unverdicted · novelty 8.0

FPR manipulation attack perturbs benign MQTT packets to flip labels to attacks in NIDS with 80-100% success, increasing SOC delays without gradient-based methods.

Toy Models of Superposition

cs.LG · 2022-09-21 · accept · novelty 8.0

Toy models demonstrate that polysemanticity arises when neural networks store more sparse features than neurons via superposition, producing a phase transition tied to polytope geometry and increased adversarial vulnerability.

MirrorCheck: Efficient Adversarial Defense for Vision-Language Models

cs.CV · 2024-06-13 · unverdicted · novelty 7.0

MirrorCheck detects adversarial attacks on VLMs via T2I regeneration for semantic consistency checks, using stochastic model selection and one-time perturbations for robustness against adaptive attacks.

Attacking the Spike: On the Transferability and Security of Spiking Neural Networks to Adversarial Examples

cs.NE · 2022-09-07 · unverdicted · novelty 7.0

MDSE attack uses dynamic multi-surrogate gradient estimation to create adversarial examples that simultaneously fool SNNs, ViTs, and CNNs, with reported gains up to 91.4% on ensembles and 3x on adversarially trained SNNs versus Auto-PGD.

Local Hessian Spectral Filtering for Robust Intrinsic Dimension Estimation

cs.LG · 2026-05-02 · unverdicted · novelty 7.0

LHSD uses spectral filtering on the log-density Hessian to isolate tangent directions from noise and estimate local intrinsic dimension scalably via Stochastic Lanczos Quadrature.

Sample-wise Adaptive Weighting for Transfer Consistency in Adversarial Distillation

cs.CV · 2025-12-11 · conditional · novelty 6.0

SAAD adaptively weights adversarial training samples by their transferability to the teacher, yielding higher AutoAttack robustness than prior distillation methods on CIFAR and Tiny-ImageNet without extra compute.

Deep Privacy Funnel Model: From a Discriminative to a Generative Approach with an Application to Face Recognition

cs.LG · 2024-04-03 · unverdicted · novelty 6.0

Introduces Generative Privacy Funnel (GenPF) and deep variational PF (DVPF) models that extend the privacy funnel to generative settings and provide a controllable privacy-utility trade-off with reduced sensitive attribute leakage in face recognition.

Fooling a Real Car with Adversarial Traffic Signs

cs.CR · 2019-06-30 · unverdicted · novelty 6.0

A reproducible pipeline produces physical adversarial traffic signs that successfully attack production-grade traffic sign recognition systems in a real car under black-box conditions.

Hiding Faces in Plain Sight: Disrupting AI Face Synthesis with Adversarial Perturbations

cs.CV · 2019-06-21 · unverdicted · novelty 6.0

Adversarial perturbations disrupt DNN-based face detectors under white-box, gray-box, and black-box settings to sabotage training data for AI face synthesis.

Jailbreaking Black Box Large Language Models in Twenty Queries

cs.LG · 2023-10-12 · conditional · novelty 6.0

PAIR uses an attacker LLM to iteratively craft effective jailbreak prompts for black-box target LLMs in fewer than 20 queries.

Towards Universal Physical Adversarial Attacks via a Joint Multi-Objective and Multi-Model Optimization Framework

cs.CV · 2026-05-18 · unverdicted · novelty 5.0

JMOF is a new optimization framework for physical adversarial attacks that improves cross-model transferability and enables simultaneous attacks on multiple vision tasks such as object detection and semantic segmentation.

Graph Interpolating Activation Improves Both Natural and Robust Accuracies in Data-Efficient Deep Learning

cs.LG · 2019-07-16 · unverdicted · novelty 5.0

Graph Laplacian interpolating activation replaces softmax in DNNs and improves natural accuracy, robust accuracy, and data efficiency.

Cellular State Transformations using Generative Adversarial Networks

q-bio.QM · 2019-06-28 · unverdicted · novelty 5.0

TSPG applies conditional GANs to generate realistic transcriptome perturbations that mimic source-to-target gene expression state transitions and highlight biologically enriched genes.

Laundering AI Authority with Adversarial Examples

cs.CR · 2026-05-05 · unverdicted · novelty 5.0

Adversarial examples enable AI authority laundering by causing production VLMs to give authoritative but wrong responses on subtly perturbed images, with success rates of 22-100% using decade-old attack methods.

Beyond Attack Success Rate: A Multi-Metric Evaluation of Adversarial Transferability in Medical Imaging Models

cs.CV · 2026-04-16 · unverdicted · novelty 4.0

Perceptual quality metrics correlate strongly with each other but show minimal correlation with attack success rate across medical imaging models and datasets, making ASR alone inadequate for assessing adversarial robustness.

SoK: A Comprehensive Analysis of the Current Status of Neural Tangent Generalization Attacks with Research Directions

cs.LG · 2026-05-12 · accept · novelty 3.0

NTGA is the first clean-label generalization attack under black-box settings but is vulnerable to adversarial training and image transformations, with newer attacks outperforming it.

citing papers explorer

Showing 16 of 16 citing papers.

Uncovering and Understanding FPR Manipulation Attack in Industrial IoT Networks cs.CR · 2026-01-20 · unverdicted · none · ref 75 · internal anchor
FPR manipulation attack perturbs benign MQTT packets to flip labels to attacks in NIDS with 80-100% success, increasing SOC delays without gradient-based methods.
Toy Models of Superposition cs.LG · 2022-09-21 · accept · none · ref 15
Toy models demonstrate that polysemanticity arises when neural networks store more sparse features than neurons via superposition, producing a phase transition tied to polytope geometry and increased adversarial vulnerability.
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models cs.CV · 2024-06-13 · unverdicted · none · ref 58 · internal anchor
MirrorCheck detects adversarial attacks on VLMs via T2I regeneration for semantic consistency checks, using stochastic model selection and one-time perturbations for robustness against adaptive attacks.
Attacking the Spike: On the Transferability and Security of Spiking Neural Networks to Adversarial Examples cs.NE · 2022-09-07 · unverdicted · none · ref 21 · internal anchor
MDSE attack uses dynamic multi-surrogate gradient estimation to create adversarial examples that simultaneously fool SNNs, ViTs, and CNNs, with reported gains up to 91.4% on ensembles and 3x on adversarially trained SNNs versus Auto-PGD.
Local Hessian Spectral Filtering for Robust Intrinsic Dimension Estimation cs.LG · 2026-05-02 · unverdicted · none · ref 274
LHSD uses spectral filtering on the log-density Hessian to isolate tangent directions from noise and estimate local intrinsic dimension scalably via Stochastic Lanczos Quadrature.
Sample-wise Adaptive Weighting for Transfer Consistency in Adversarial Distillation cs.CV · 2025-12-11 · conditional · none · ref 47 · internal anchor
SAAD adaptively weights adversarial training samples by their transferability to the teacher, yielding higher AutoAttack robustness than prior distillation methods on CIFAR and Tiny-ImageNet without extra compute.
Deep Privacy Funnel Model: From a Discriminative to a Generative Approach with an Application to Face Recognition cs.LG · 2024-04-03 · unverdicted · none · ref 133 · internal anchor
Introduces Generative Privacy Funnel (GenPF) and deep variational PF (DVPF) models that extend the privacy funnel to generative settings and provide a controllable privacy-utility trade-off with reduced sensitive attribute leakage in face recognition.
Fooling a Real Car with Adversarial Traffic Signs cs.CR · 2019-06-30 · unverdicted · none · ref 26 · internal anchor
A reproducible pipeline produces physical adversarial traffic signs that successfully attack production-grade traffic sign recognition systems in a real car under black-box conditions.
Hiding Faces in Plain Sight: Disrupting AI Face Synthesis with Adversarial Perturbations cs.CV · 2019-06-21 · unverdicted · none · ref 56 · internal anchor
Adversarial perturbations disrupt DNN-based face detectors under white-box, gray-box, and black-box settings to sabotage training data for AI face synthesis.
Jailbreaking Black Box Large Language Models in Twenty Queries cs.LG · 2023-10-12 · conditional · none · ref 52
PAIR uses an attacker LLM to iteratively craft effective jailbreak prompts for black-box target LLMs in fewer than 20 queries.
Towards Universal Physical Adversarial Attacks via a Joint Multi-Objective and Multi-Model Optimization Framework cs.CV · 2026-05-18 · unverdicted · none · ref 39 · internal anchor
JMOF is a new optimization framework for physical adversarial attacks that improves cross-model transferability and enables simultaneous attacks on multiple vision tasks such as object detection and semantic segmentation.
Graph Interpolating Activation Improves Both Natural and Robust Accuracies in Data-Efficient Deep Learning cs.LG · 2019-07-16 · unverdicted · none · ref 11 · internal anchor
Graph Laplacian interpolating activation replaces softmax in DNNs and improves natural accuracy, robust accuracy, and data efficiency.
Cellular State Transformations using Generative Adversarial Networks q-bio.QM · 2019-06-28 · unverdicted · none · ref 18 · internal anchor
TSPG applies conditional GANs to generate realistic transcriptome perturbations that mimic source-to-target gene expression state transitions and highlight biologically enriched genes.
Laundering AI Authority with Adversarial Examples cs.CR · 2026-05-05 · unverdicted · none · ref 36
Adversarial examples enable AI authority laundering by causing production VLMs to give authoritative but wrong responses on subtly perturbed images, with success rates of 22-100% using decade-old attack methods.
Beyond Attack Success Rate: A Multi-Metric Evaluation of Adversarial Transferability in Medical Imaging Models cs.CV · 2026-04-16 · unverdicted · none · ref 13
Perceptual quality metrics correlate strongly with each other but show minimal correlation with attack success rate across medical imaging models and datasets, making ASR alone inadequate for assessing adversarial robustness.
SoK: A Comprehensive Analysis of the Current Status of Neural Tangent Generalization Attacks with Research Directions cs.LG · 2026-05-12 · accept · none · ref 55 · internal anchor
NTGA is the first clean-label generalization attack under black-box settings but is vulnerable to adversarial training and image transformations, with newer attacks outperforming it.

Delving into transferable adversarial examples and black-box attacks

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer