FPR manipulation attack perturbs benign MQTT packets to flip labels to attacks in NIDS with 80-100% success, increasing SOC delays without gradient-based methods.
hub
Delving into transferable adversarial examples and black-box attacks
16 Pith papers cite this work. Polarity classification is still indexing.
abstract
An intriguing property of deep neural networks is the existence of adversarial examples, which can transfer among different architectures. These transferable adversarial examples may severely hinder deep neural network-based applications. Previous works mostly study the transferability using small scale datasets. In this work, we are the first to conduct an extensive study of the transferability over large models and a large scale dataset, and we are also the first to study the transferability of targeted adversarial examples with their target labels. We study both non-targeted and targeted adversarial examples, and show that while transferable non-targeted adversarial examples are easy to find, targeted adversarial examples generated using existing approaches almost never transfer with their target labels. Therefore, we propose novel ensemble-based approaches to generating transferable adversarial examples. Using such approaches, we observe a large proportion of targeted adversarial examples that are able to transfer with their target labels for the first time. We also present some geometric studies to help understanding the transferable adversarial examples. Finally, we show that the adversarial examples generated using ensemble-based approaches can successfully attack Clarifai.com, which is a black-box image classification system.
hub tools
citation-role summary
citation-polarity summary
roles
background 4polarities
background 4representative citing papers
Toy models demonstrate that polysemanticity arises when neural networks store more sparse features than neurons via superposition, producing a phase transition tied to polytope geometry and increased adversarial vulnerability.
MirrorCheck detects adversarial attacks on VLMs via T2I regeneration for semantic consistency checks, using stochastic model selection and one-time perturbations for robustness against adaptive attacks.
MDSE attack uses dynamic multi-surrogate gradient estimation to create adversarial examples that simultaneously fool SNNs, ViTs, and CNNs, with reported gains up to 91.4% on ensembles and 3x on adversarially trained SNNs versus Auto-PGD.
LHSD uses spectral filtering on the log-density Hessian to isolate tangent directions from noise and estimate local intrinsic dimension scalably via Stochastic Lanczos Quadrature.
SAAD adaptively weights adversarial training samples by their transferability to the teacher, yielding higher AutoAttack robustness than prior distillation methods on CIFAR and Tiny-ImageNet without extra compute.
Introduces Generative Privacy Funnel (GenPF) and deep variational PF (DVPF) models that extend the privacy funnel to generative settings and provide a controllable privacy-utility trade-off with reduced sensitive attribute leakage in face recognition.
A reproducible pipeline produces physical adversarial traffic signs that successfully attack production-grade traffic sign recognition systems in a real car under black-box conditions.
Adversarial perturbations disrupt DNN-based face detectors under white-box, gray-box, and black-box settings to sabotage training data for AI face synthesis.
PAIR uses an attacker LLM to iteratively craft effective jailbreak prompts for black-box target LLMs in fewer than 20 queries.
JMOF is a new optimization framework for physical adversarial attacks that improves cross-model transferability and enables simultaneous attacks on multiple vision tasks such as object detection and semantic segmentation.
Graph Laplacian interpolating activation replaces softmax in DNNs and improves natural accuracy, robust accuracy, and data efficiency.
TSPG applies conditional GANs to generate realistic transcriptome perturbations that mimic source-to-target gene expression state transitions and highlight biologically enriched genes.
Adversarial examples enable AI authority laundering by causing production VLMs to give authoritative but wrong responses on subtly perturbed images, with success rates of 22-100% using decade-old attack methods.
Perceptual quality metrics correlate strongly with each other but show minimal correlation with attack success rate across medical imaging models and datasets, making ASR alone inadequate for assessing adversarial robustness.
NTGA is the first clean-label generalization attack under black-box settings but is vulnerable to adversarial training and image transformations, with newer attacks outperforming it.
citing papers explorer
-
Uncovering and Understanding FPR Manipulation Attack in Industrial IoT Networks
FPR manipulation attack perturbs benign MQTT packets to flip labels to attacks in NIDS with 80-100% success, increasing SOC delays without gradient-based methods.
-
Toy Models of Superposition
Toy models demonstrate that polysemanticity arises when neural networks store more sparse features than neurons via superposition, producing a phase transition tied to polytope geometry and increased adversarial vulnerability.
-
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models
MirrorCheck detects adversarial attacks on VLMs via T2I regeneration for semantic consistency checks, using stochastic model selection and one-time perturbations for robustness against adaptive attacks.
-
Attacking the Spike: On the Transferability and Security of Spiking Neural Networks to Adversarial Examples
MDSE attack uses dynamic multi-surrogate gradient estimation to create adversarial examples that simultaneously fool SNNs, ViTs, and CNNs, with reported gains up to 91.4% on ensembles and 3x on adversarially trained SNNs versus Auto-PGD.
-
Local Hessian Spectral Filtering for Robust Intrinsic Dimension Estimation
LHSD uses spectral filtering on the log-density Hessian to isolate tangent directions from noise and estimate local intrinsic dimension scalably via Stochastic Lanczos Quadrature.
-
Sample-wise Adaptive Weighting for Transfer Consistency in Adversarial Distillation
SAAD adaptively weights adversarial training samples by their transferability to the teacher, yielding higher AutoAttack robustness than prior distillation methods on CIFAR and Tiny-ImageNet without extra compute.
-
Deep Privacy Funnel Model: From a Discriminative to a Generative Approach with an Application to Face Recognition
Introduces Generative Privacy Funnel (GenPF) and deep variational PF (DVPF) models that extend the privacy funnel to generative settings and provide a controllable privacy-utility trade-off with reduced sensitive attribute leakage in face recognition.
-
Fooling a Real Car with Adversarial Traffic Signs
A reproducible pipeline produces physical adversarial traffic signs that successfully attack production-grade traffic sign recognition systems in a real car under black-box conditions.
-
Hiding Faces in Plain Sight: Disrupting AI Face Synthesis with Adversarial Perturbations
Adversarial perturbations disrupt DNN-based face detectors under white-box, gray-box, and black-box settings to sabotage training data for AI face synthesis.
-
Jailbreaking Black Box Large Language Models in Twenty Queries
PAIR uses an attacker LLM to iteratively craft effective jailbreak prompts for black-box target LLMs in fewer than 20 queries.
-
Towards Universal Physical Adversarial Attacks via a Joint Multi-Objective and Multi-Model Optimization Framework
JMOF is a new optimization framework for physical adversarial attacks that improves cross-model transferability and enables simultaneous attacks on multiple vision tasks such as object detection and semantic segmentation.
-
Graph Interpolating Activation Improves Both Natural and Robust Accuracies in Data-Efficient Deep Learning
Graph Laplacian interpolating activation replaces softmax in DNNs and improves natural accuracy, robust accuracy, and data efficiency.
-
Cellular State Transformations using Generative Adversarial Networks
TSPG applies conditional GANs to generate realistic transcriptome perturbations that mimic source-to-target gene expression state transitions and highlight biologically enriched genes.
-
Laundering AI Authority with Adversarial Examples
Adversarial examples enable AI authority laundering by causing production VLMs to give authoritative but wrong responses on subtly perturbed images, with success rates of 22-100% using decade-old attack methods.
-
Beyond Attack Success Rate: A Multi-Metric Evaluation of Adversarial Transferability in Medical Imaging Models
Perceptual quality metrics correlate strongly with each other but show minimal correlation with attack success rate across medical imaging models and datasets, making ASR alone inadequate for assessing adversarial robustness.
-
SoK: A Comprehensive Analysis of the Current Status of Neural Tangent Generalization Attacks with Research Directions
NTGA is the first clean-label generalization attack under black-box settings but is vulnerable to adversarial training and image transformations, with newer attacks outperforming it.