Out-of-the-box: Black-box Causal Attacks on Object Detectors

Melane Navaratnarajah , David A. Kelly , Hana Chockler

Authors on Pith no claims yet

classification 💻 cs.CV cs.AI

keywords attacksblackcattblack-boxmethodsdetectorsobjectotherwhen

read the original abstract

Adversarial perturbations are a useful way to expose vulnerabilities in object detectors. Existing perturbation methods are frequently white-box, architecture specific and use a loss function. More importantly, while they are often successful, it is rarely clear why they work. Insights into the mechanism of this success would allow developers to understand and analyze these attacks, as well as fine-tune the model to prevent them. This paper presents BlackCAtt, a black-box algorithm and tool, which uses minimal, causally sufficient pixel sets to construct explainable, imperceptible, reproducible, architecture-agnostic attacks on object detectors. We evaluate BlackCAtt on standard benchmarks and compare it to other black-box adversarial attacks methods. When BlackCAtt has access only to the position and label of a bounding box, it produces attacks that are comparable or better to those produced by other black-box methods. When BlackCAtt has access to the model confidence as well, it can work as a meta-algorithm, improving the ability of standard black-box techniques to construct smaller, less perceptible attacks. As BlackCAtt attacks manipulate causes only, the attacks become fully explainable. We compare the performance of BlackCAtt with other black-box attack methods and show that targeting causal pixels leads to smaller and less perceptible attacks. For example, when using BlackCAtt with SquareAttack, it reduces the average distance ($L_0$ norm) of the attack from the original input from $0.987$ to $0.072$, while maintaining a similar success rate. We perform ablation studies on the BlackCAtt algorithm and analyze the effect of different components on its performance.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

If It's Good Enough for You, It's Good Enough for Me: Transferability of Audio Sufficiencies across Models
cs.SD 2026-04 unverdicted novelty 7.0

Transferability analysis finds that minimal sufficient signals transfer across audio models at rates varying by task, around 26% for music genre classification, with some deepfake models showing distinct behaviors not...