Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models
read the original abstract
Many machine learning algorithms are vulnerable to almost imperceptible perturbations of their inputs. So far it was unclear how much risk adversarial perturbations carry for the safety of real-world machine learning applications because most methods used to generate such perturbations rely either on detailed model information (gradient-based attacks) or on confidence scores such as class probabilities (score-based attacks), neither of which are available in most real-world scenarios. In many such cases one currently needs to retreat to transfer-based attacks which rely on cumbersome substitute models, need access to the training data and can be defended against. Here we emphasise the importance of attacks which solely rely on the final model decision. Such decision-based attacks are (1) applicable to real-world black-box models such as autonomous cars, (2) need less knowledge and are easier to apply than transfer-based attacks and (3) are more robust to simple defences than gradient- or score-based attacks. Previous attacks in this category were limited to simple models or simple datasets. Here we introduce the Boundary Attack, a decision-based attack that starts from a large adversarial perturbation and then seeks to reduce the perturbation while staying adversarial. The attack is conceptually simple, requires close to no hyperparameter tuning, does not rely on substitute models and is competitive with the best gradient-based attacks in standard computer vision tasks like ImageNet. We apply the attack on two black-box algorithms from Clarifai.com. The Boundary Attack in particular and the class of decision-based attacks in general open new avenues to study the robustness of machine learning models and raise new questions regarding the safety of deployed machine learning systems. An implementation of the attack is available as part of Foolbox at https://github.com/bethgelab/foolbox .
This paper has not been read by Pith yet.
Forward citations
Cited by 10 Pith papers
-
Uncovering and Understanding FPR Manipulation Attack in Industrial IoT Networks
FPR manipulation attack perturbs benign MQTT packets to flip labels to attacks in NIDS with 80-100% success, increasing SOC delays without gradient-based methods.
-
Empirical Evidence for Simply Connected Decision Regions in Image Classifiers
Empirical tests with quad-mesh filling indicate that decision regions in modern image classifiers are simply connected.
-
Hard-Label Black-Box Attacks on 3D Point Clouds
A spectrum-aware decision boundary algorithm enables effective hard-label black-box adversarial attacks on 3D point cloud models by fusing spectral information across classes and performing curvature-aware iterative o...
-
Unsolved Problems in ML Safety
The paper presents a roadmap that identifies four unsolved problems in ML safety: robustness against hazards, monitoring for hazards, alignment of model goals with human intent, and systemic safety.
-
Hiding Faces in Plain Sight: Disrupting AI Face Synthesis with Adversarial Perturbations
Adversarial perturbations disrupt DNN-based face detectors under white-box, gray-box, and black-box settings to sabotage training data for AI face synthesis.
-
Memory Efficient Full-gradient Attacks (MEFA) Framework for Adversarial Defense Evaluations
MEFA enables exact full-gradient white-box attacks on iterative stochastic purification defenses like diffusion and Langevin EBMs by trading recomputation for lower memory, revealing vulnerabilities missed by approxim...
-
Accelerating Targeted Hard-Label Adversarial Attacks in Low-Query Black-Box Settings
TEA is a new targeted adversarial attack that incorporates edge information from the target image to reduce query count and improve performance in low-query black-box hard-label settings.
-
Graph Interpolating Activation Improves Both Natural and Robust Accuracies in Data-Efficient Deep Learning
Graph Laplacian interpolating activation replaces softmax in DNNs and improves natural accuracy, robust accuracy, and data efficiency.
-
Security and Privacy in Virtual and Robotic Assistive Systems: A Comparative Framework
A unified comparative threat-modeling framework is developed to analyze security and privacy risks across virtual and robotic assistive systems.
-
Quantum Adversarial Machine Learning: From Classical Adaptations to Quantum-Native Methods
A survey of quantum adversarial machine learning covering attacks, countermeasures, theoretical underpinnings, trends, and challenges.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.