pith. sign in

arxiv: 2606.18318 · v1 · pith:A2WL6QOTnew · submitted 2026-06-16 · 💻 cs.CV · cs.CR

Budget-Aware Adaptive Adversarial Patches for Black-Box Object Detection

Pith reviewed 2026-06-27 01:39 UTC · model grok-4.3

classification 💻 cs.CV cs.CR
keywords adversarial patchesblack-box attacksobject detectionquery efficiencyadaptive optimizationThompson samplingYOLOFaster R-CNN
0
0 comments X

The pith

A budget-aware black-box attack optimizes adversarial patch location, texture, and size adaptively to suppress object detectors under query limits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method that uses contextual Thompson sampling to place patches and NES-style updates to optimize their texture, growing the patch size only when attack progress stalls under a query budget. It evaluates success strictly on plain images without relying on expectation over transformations for the main metric. This approach yields strong suppression on CNN-based detectors like YOLOv5 and Faster R-CNN, and substantial suppression on the transformer-based YOLOS, all while using smaller patches than fixed-size baselines and revealing query versus footprint trade-offs. A pilot study shows some physical transfer to unseen objects and viewpoints.

Core claim

The central discovery is that coupling a lightweight Contextual Thompson-Sampling placer with NES-style pixel updates and an adaptive growth rule allows effective black-box attacks on object detectors with compact patches under tight query budgets, as measured by plain-image suppression on YOLOv5, Faster R-CNN, and YOLOS.

What carries the argument

Contextual Thompson-Sampling placer paired with adaptive patch growth under query budget

If this is right

  • The attack achieves strong suppression on CNN-based detectors such as YOLOv5 and Faster R-CNN.
  • It achieves substantial suppression on the transformer-based YOLOS detector.
  • It exposes clear query-footprint trade-offs relative to fixed-size and heuristic baselines.
  • A print-capture pilot shows transfer across unseen physical objects and viewpoints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the adaptive sizing rule generalizes, real-world patches could become harder to detect visually while remaining effective.
  • The separation of plain-image success from EOT robustness may encourage future evaluations to report both metrics independently.
  • The query-budget coupling suggests that similar adaptive mechanisms could apply to other black-box optimization tasks in vision.
  • Testing the method on additional unseen physical capture conditions would clarify the limits of the observed transfer.

Load-bearing premise

The strict plain-image suppression test without EOT, combined with the adaptive growth rule, produces a reliable measure of attack success that generalizes beyond the evaluated models and datasets.

What would settle it

Observing whether the method maintains similar suppression rates and query efficiencies when tested on a different object detector architecture or a new dataset not used in the original evaluations.

read the original abstract

Adversarial patches pose a practical threat to modern object detectors. Prior work shows vulnerability, but three gaps limit actionable insight: (i) few \emph{score-based black-box} attacks \emph{jointly} optimize patch \emph{location, texture, and size} under tight query budgets; (ii) success is rarely tied to the patch's \emph{visual footprint}; and (iii) evaluations often conflate EOT robustness with plain-view suppression. We present \method{}, a query-efficient, budget-adaptive black-box attack that couples a lightweight \emph{Contextual Thompson-Sampling} placer with NES-style pixel updates, growing the patch only when progress stalls. Reporting is anchored by a \emph{strict plain-image} suppression test; EOT is audited but never used as a substitute for success, and optional appearance/printability weights expose strength--visibility trade-offs. Across YOLOv5, Faster R-CNN, and YOLOS, \method{} achieves strong suppression on CNN-based detectors and substantial suppression on the transformer-based detector, using compact patches and exposing clear query--footprint trade-offs relative to fixed-size and heuristic baselines. A print--capture pilot further shows transfer across unseen physical objects and viewpoints.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents exttt{BAP} (Budget-Aware Adaptive Patches), a score-based black-box attack that jointly optimizes adversarial patch location, texture, and size under a query budget. It couples a Contextual Thompson-Sampling placer with NES-style pixel updates and grows the patch only when progress stalls; success is defined by a strict plain-image suppression test (EOT audited separately) on YOLOv5, Faster R-CNN, and YOLOS, with claims of strong suppression on the CNN detectors, substantial suppression on the transformer detector, compact patches, and explicit query-footprint trade-offs versus fixed-size and heuristic baselines, plus a print-capture physical pilot.

Significance. If the reported suppression rates and trade-offs hold under the stated conditions, the work supplies a concrete, budget-constrained baseline for multi-parameter black-box patch attacks and explicitly decouples plain-view efficacy from EOT robustness, which could usefully inform both attack construction and defense evaluation in practical settings.

major comments (2)
  1. [Abstract] Abstract: the central performance claims rest on the plain-image suppression metric without EOT. Because the Contextual Thompson-Sampling placer and stall-triggered growth rule are optimized directly against the three evaluated models on their training distributions, the metric risks capturing model-specific decision-boundary exploits rather than generalizable attack strength; this is load-bearing for the assertions of 'strong suppression' and 'clear query-footprint trade-offs'.
  2. [Abstract] Abstract: no equations, component ablations, or error bars are supplied to quantify the contribution of the placer, the NES updater, or the adaptive growth rule, preventing verification that the reported suppression numbers are attributable to the proposed method rather than baseline behavior.
minor comments (1)
  1. [Abstract] Abstract: the method is introduced only as \method{} with no expanded name or acronym on first use.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below, clarifying the role of the plain-image metric and the availability of methodological details while committing to targeted revisions where appropriate.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central performance claims rest on the plain-image suppression metric without EOT. Because the Contextual Thompson-Sampling placer and stall-triggered growth rule are optimized directly against the three evaluated models on their training distributions, the metric risks capturing model-specific decision-boundary exploits rather than generalizable attack strength; this is load-bearing for the assertions of 'strong suppression' and 'clear query-footprint trade-offs'.

    Authors: Black-box attacks are by definition optimized against the queried model, and our evaluation deliberately spans CNN-based (YOLOv5, Faster R-CNN) and transformer-based (YOLOS) detectors while reporting explicit query-footprint trade-offs against fixed-size and heuristic baselines. The manuscript explicitly decouples plain-view suppression from EOT (audited separately) and includes a print-capture physical pilot demonstrating transfer to unseen objects and viewpoints. We agree that the scope of generalizability should be stated more explicitly and will add a dedicated paragraph in the discussion clarifying that the reported numbers characterize performance under the stated query budgets and model distributions rather than claiming universal transfer. revision: partial

  2. Referee: [Abstract] Abstract: no equations, component ablations, or error bars are supplied to quantify the contribution of the placer, the NES updater, or the adaptive growth rule, preventing verification that the reported suppression numbers are attributable to the proposed method rather than baseline behavior.

    Authors: The abstract is a concise summary and therefore omits equations and ablations; the full manuscript (Section 3) provides the Contextual Thompson-Sampling formulation, the NES-style pixel update rule, and the stall-triggered growth logic, while Section 4 reports comparisons against fixed-size and heuristic baselines. To directly address the request for component-level quantification, we will add a new ablation table and error bars (from repeated runs) in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical attack evaluation with no self-referential derivations

full rationale

The paper presents an empirical black-box attack method (Contextual Thompson-Sampling placer + NES-style updates + stall-triggered growth) evaluated via plain-image suppression on three detectors. No equations, uniqueness theorems, or fitted parameters are shown that reduce any reported success metric to the method's own inputs by construction. Claims rest on direct experimental reporting against external models and datasets, with EOT audited separately but not substituted into the primary metric. This is self-contained empirical work with no load-bearing self-citations or ansatzes in the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities can be extracted or audited.

pith-pipeline@v0.9.1-grok · 5764 in / 1015 out tokens · 21237 ms · 2026-06-27T01:39:03.219937+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 8 canonical work pages · 5 internal anchors

  1. [1]

    Budget-Aware Adaptive Adversarial Patches for Black-Box Object Detection

    INTRODUCTION In critical applications such as autonomous driving, robotics, and surveillance, computer vision systems increasingly de- pend on object detectors to detect and classify objects within a scene. These detectors span one-stage CNN architectures such as YOLO [1], two-stage CNN detectors such as Faster R- CNN [2], and transformer-based detectors ...

  2. [2]

    Bandits & Priors,

    RELATED WORK Adversarial Patches and Physical Attacks on Vision Mod- els.Adversarial patches are localized visual triggers that in- duce model misbehavior when inserted into a scene. Brownet al.introduceduniversal, printablepatches that drive classifiers toward a chosen target under varied viewing conditions [ 5]. Liuet al.extended this idea to object det...

  3. [3]

    move to the same score-only interface but use a GAN la- tent search that lowers Tiny-YOLOv3 average precision to 23 % at a cost of ∼ 1.1×105 queries. In the same setting, PATCH- BANDITsuppresses the original top class on 77.5 % of images with a median of 49 calls and an 8.3 % patch area—over three orders of magnitude fewer queries and a smaller footprint....

  4. [4]

    METHODOLOGY Algorithm 1 summarizes PATCHBANDIT, a query-bounded black-box patch attack that jointly optimizes (i) where to place a patch, (ii) the patch texture, and (iii) the patch size. Each iteration selects a candidate location via contextual Thompson sampling, updates patch pixels with NES-style zeroth-order optimization, and grows the patch when pro...

  5. [5]

    EV ALUATION 4.1. Digital Results Experimental palette.We evaluate 18 controlled variants of PATCHBANDITthat each toggle a single design knob (growth patience, query schedule, context masking, budget stress tests, stealth bias, EOT robustness, and fixed-size controls). Unless stated otherwise, results use the same protocol as the baseline; selected setting...

  6. [6]

    CONCLUSION PATCHBANDITcombines contextual Thompson-sampling placement, NES zeroth-order updates, and progress-triggered growth into a score-only attack loop. Using a strict plain- view suppression criterion and reporting EOT separately, it achieves strong suppression across CNN-based detectors and substantial suppression on a transformer-based detector, w...

  7. [7]

    You only look once: Unified, real-time object detection,

    Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi, “You only look once: Unified, real-time object detection,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788

  8. [8]

    Faster r-cnn: Towards real-time object detection with region proposal networks,

    Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,”Advances in neural information processing systems, vol. 28, 2015

  9. [9]

    You only look at one sequence: Rethinking transformer in vision through object detection,

    Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, and Wenyu Liu, “You only look at one sequence: Rethinking transformer in vision through object detection,”Advances in Neu- ral Information Processing Systems, vol. 34, pp. 26183– 26197, 2021

  10. [10]

    Intriguing properties of neural networks

    Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus, “Intriguing properties of neural networks,”arXiv preprint arXiv:1312.6199, 2013

  11. [11]

    Adversarial Patch

    Tom B Brown, Dandelion Mané, Aurko Roy, Martín Abadi, and Justin Gilmer, “Adversarial patch,”arXiv preprint arXiv:1712.09665, 2017

  12. [12]

    DPatch: An Adversarial Patch Attack on Object Detectors

    Xin Liu, Huanrui Yang, Ziwei Liu, Linghao Song, Hai Li, and Yiran Chen, “Dpatch: An adversarial patch attack on object detectors,”preprint arXiv:1806.02299, 2018

  13. [13]

    Fool- ing automated surveillance cameras: Adversarial patches to attack person detection,

    Lien Thys, Wiebe Van Ranst, and Toon Goedemé, “Fool- ing automated surveillance cameras: Adversarial patches to attack person detection,” inCVPR Workshops, 2019

  14. [14]

    Black-box adversarial attacks with limited queries and information,

    Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin, “Black-box adversarial attacks with limited queries and information,” inInternational conference on machine learning. PMLR, 2018, pp. 2137–2146

  15. [15]

    Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors

    Andrew Ilyas, Logan Engstrom, and Aleksander Madry, “Prior convictions: Black-box adversarial attacks with bandits and priors,”preprint arXiv:1807.07978, 2018

  16. [16]

    Parallel rectangle flip attack: A query-based black-box attack against object detection,

    Siyuan Liang, Baoyuan Wu, Yanbo Fan, Xingxing Wei, and Xiaochun Cao, “Parallel rectangle flip attack: A query-based black-box attack against object detection,” arXiv preprint arXiv:2201.08970, 2022

  17. [17]

    Si- multaneously optimizing perturbations and positions for black-box adversarial patch attacks,

    Xingxing Wei, Ying Guo, Jie Yu, and Bo Zhang, “Si- multaneously optimizing perturbations and positions for black-box adversarial patch attacks,”IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 7, pp. 9041–9054, 2022

  18. [18]

    Patch of invisibility: Naturalistic physical black-box adver- sarial attacks on object detectors,

    Raz Lapid, Eylon Mizrahi, and Moshe Sipper, “Patch of invisibility: Naturalistic physical black-box adver- sarial attacks on object detectors,”arXiv preprint arXiv:2303.04238, 2023

  19. [19]

    Synthesizing robust adversarial exam- ples,

    Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok, “Synthesizing robust adversarial exam- ples,” inInternational conference on machine learning. PMLR, 2018, pp. 284–293

  20. [20]

    Patchattack: A black-box texture- based attack with reinforcement learning,

    Chenglin Yang, Adam Kortylewski, Cihang Xie, Yinzhi Cao, and Alan Yuille, “Patchattack: A black-box texture- based attack with reinforcement learning,” inEuropean Conference on Computer Vision. Springer, 2020

  21. [21]

    The translucent patch: A physical and univer- sal attack on object detectors,

    Alon Zolfi, Moshe Kravchik, Yuval Elovici, and Asaf Shabtai, “The translucent patch: A physical and univer- sal attack on object detectors,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 15232–15241

  22. [22]

    Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models,

    Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh, “Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models,” inProceedings of the 10th ACM workshop on artificial intelligence and security, 2017, pp. 15–26

  23. [23]

    Black-box adversarial attacks with limited queries and information,

    Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin, “Black-box adversarial attacks with limited queries and information,” inICML, 2018

  24. [24]

    Square attack: a query- efficient black-box adversarial attack via random search,

    Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein, “Square attack: a query- efficient black-box adversarial attack via random search,” inEuropean conference on computer vision. Springer, 2020, pp. 484–501

  25. [25]

    Universal physical camouflage attacks on object detectors,

    Lifeng Huang, Chengying Gao, Yuyin Zhou, Cihang Xie, Alan L Yuille, Changqing Zou, and Ning Liu, “Universal physical camouflage attacks on object detectors,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 720–729

  26. [26]

    Bb-patch: Blackbox adversarial patch- attack using zeroth-order optimization,

    Satyadwyoom Kumar, Saurabh Gupta, and Arun Bal- aji Buduru, “Bb-patch: Blackbox adversarial patch- attack using zeroth-order optimization,”arXiv preprint arXiv:2405.06049, 2024

  27. [27]

    Robust physical-world attacks on deep learning visual classification,

    Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song, “Robust physical-world attacks on deep learning visual classification,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1625–1634