pith. sign in

arxiv: 2601.14300 · v3 · pith:OOSBYBIJnew · submitted 2026-01-17 · 💻 cs.LG · cs.CR

Low-Cost Hard-Label Adversarial Attack with Theoretical Foundations

Pith reviewed 2026-05-25 07:09 UTC · model grok-4.3

classification 💻 cs.LG cs.CR
keywords hard-label attacksblack-box adversarial attacksquery-efficient attacksgradient sign approximationtheoretical attack frameworkPattern-Driven Optimizationzero-query initialization
0
0 comments X

The pith

Existing sign-flipping hard-label attacks approximate the true gradient sign, which directly yields a zero-query initialization and Pattern-Driven Optimization that lower query counts while raising success rates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a single theoretical account in which prior hard-label methods that flip the predicted label are reinterpreted as noisy estimates of the sign of the true gradient. From that account it derives an initialization that requires zero queries and an optimization routine called Pattern-Driven Optimization whose query cost is provably lower than exhaustive search. Experiments on CIFAR-10, ImageNet, ObjectNet, commercial APIs, CLIP, and segmentation tasks show the resulting attack reaches higher success rates at low query budgets than prior hard-label methods and evades the stateful detector Blacklight. A reader cares because hard-label access is the most realistic threat model for deployed systems, so any reduction in its cost changes the practical security margin.

Core claim

We establish a unified theoretical framework showing that existing sign-flipping hard-label attacks can be understood as approximating the true gradient sign. Guided by this, we propose a novel attack framework featuring a zero-query initialization strategy and a Pattern-Driven Optimization (PDO) algorithm. We provide theoretical guarantees that our initialization yields higher cosine similarity to the true gradient sign than random baselines, and our PDO module achieves significantly lower query complexity than baseline search methods.

What carries the argument

The unified theoretical framework that recasts sign-flipping hard-label attacks as approximations to the true gradient sign, which then supplies both the zero-query initialization and the Pattern-Driven Optimization (PDO) routine.

If this is right

  • The zero-query initialization achieves measurably higher cosine similarity to the true gradient sign than random baselines.
  • Pattern-Driven Optimization reduces query complexity compared with baseline search methods.
  • The full attack achieves higher success rates than prior hard-label methods under low query budgets on CIFAR-10, ImageNet, and ObjectNet.
  • The attack maintains high success on adversarially trained models, commercial APIs, CLIP, ImageNet-C, PathMNIST, and segmentation tasks.
  • The attack evades the stateful defense Blacklight at a 0% detection rate.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Defense designers may need to monitor or randomize initial perturbations rather than only the optimization trajectory.
  • The same gradient-sign view could be applied to other label-only or score-only threat models beyond images.
  • Lower per-example query counts make rate-limited production APIs more vulnerable than previously measured.
  • The framework supplies a concrete way to compare future hard-label methods by their cosine similarity to the true sign rather than by success rate alone.

Load-bearing premise

That modeling existing attacks as gradient-sign approximations produces initialization and optimization choices that improve performance on real neural networks.

What would settle it

An experiment in which the proposed initialization fails to produce higher cosine similarity to the true gradient sign than random initialization, or in which PDO fails to reduce query complexity relative to standard search baselines on the same models.

Figures

Figures reproduced from arXiv: 2601.14300 by Fengpeng Li, Isao Echizen, Jiantao Zhou, Jun Liu, Leo Yu Zhang.

Figure 1
Figure 1. Figure 1: Comparison of our approach with traditional meth [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Framework of the proposed hard-label attack method DPAttack. (a) Stage 1: The Dynamic Decision-Making (DDM) [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Classification sensitivity analysis of frequency [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Clean image frequency statistics. The solid line and [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: (a) Cosine similarity between the true gradient sign [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Evolution of cosine similarity between the true gra [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Empirical validation of Theorem 5 across various [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The influence of block size 𝑤 used in BDCT. leads to significant fluctuations in query efficiency, with the Avg.Q varying by over 100 as shown in [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
read the original abstract

Hard-label black-box attacks, relying solely on top-1 predictions, represent one of the most challenging yet practically threat models. Despite recent progress, existing approaches face two key limitations: (1) they overlook the critical role of initialization, focusing primarily on optimization strategies; and (2) they rely heavily on empirical heuristics without theoretical guarantees. To bridge this gap, we establish a unified theoretical framework showing that existing sign-flipping hard-label attacks can be understood as approximating the true gradient sign. Guided by this principled analysis, we propose a novel attack framework featuring a zero-query initialization strategy and a Pattern-Driven Optimization (PDO) algorithm. We provide theoretical guarantees that our initialization yields higher cosine similarity to the true gradient sign than random baselines, and our PDO module achieves significantly lower query complexity than baseline search methods. Extensive experiments across CIFAR-10, ImageNet, and ObjectNet-covering standard and adversarially trained models, commercial APIs, and CLIP models-demonstrate that our method consistently outperforms SOTA hard-label attacks in both success rate and efficiency, particularly under low query budgets. Furthermore, our method demonstrates robust generalization across corrupted data (ImageNet-C), biomedical images (PathMNIST), and dense prediction tasks such as segmentation. Notably, it bypasses the stateful defense Blacklight, achieving a 0% detection rate.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims to establish a unified theoretical framework interpreting existing sign-flipping hard-label black-box attacks as approximations to the true gradient sign. Guided by this analysis, it proposes a zero-query initialization strategy and a Pattern-Driven Optimization (PDO) algorithm, asserting theoretical guarantees that the initialization yields higher cosine similarity to the true gradient sign than random baselines and that PDO achieves significantly lower query complexity than baseline search methods. Extensive experiments on CIFAR-10, ImageNet, ObjectNet, commercial APIs, CLIP models, corrupted data, biomedical images, and segmentation tasks demonstrate consistent outperformance over SOTA hard-label attacks in success rate and efficiency, including bypassing the Blacklight defense.

Significance. If the unified framework provides rigorous, non-heuristic guarantees that transfer to real networks and the empirical gains are driven by the theory rather than implementation details, this could advance principled design of low-query hard-label attacks and improve understanding of existing methods in black-box adversarial robustness evaluation.

major comments (2)
  1. [Abstract] Abstract: the central claim that the unified theoretical framework recasts sign-flipping attacks as gradient-sign approximations and directly yields the stated cosine-similarity and query-complexity guarantees requires explicit derivation and error analysis; without these, it is unclear whether the modeling assumptions hold beyond restricted regimes such as locally linear boundaries.
  2. [Abstract] Abstract: the guarantee that the zero-query initialization achieves strictly higher cosine similarity than random baselines is load-bearing for the novelty claim; if this follows only from the modeling choice rather than a parameter-free derivation, the practical improvements on real networks may not be theoretically grounded.
minor comments (1)
  1. [Experiments] The experimental section should report variance, statistical significance, and exact query-budget protocols to support the 'consistent outperformance' claim across all listed datasets and models.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the theoretical claims. We address the two major comments point-by-point below and will revise the manuscript to improve clarity and rigor.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the unified theoretical framework recasts sign-flipping attacks as gradient-sign approximations and directly yields the stated cosine-similarity and query-complexity guarantees requires explicit derivation and error analysis; without these, it is unclear whether the modeling assumptions hold beyond restricted regimes such as locally linear boundaries.

    Authors: We agree the abstract is too condensed. Section 3 of the manuscript already derives the unified framework by modeling the hard-label query as a sign approximation to the gradient of the underlying loss, with sign-flipping attacks recovered as special cases. To directly address the request, we will add an appendix containing the full step-by-step derivation, explicit error bounds under the locally linear boundary assumption, and a discussion of the regimes (including non-linear boundaries) where the approximation remains useful. This will make the modeling assumptions and their scope transparent. revision: yes

  2. Referee: [Abstract] Abstract: the guarantee that the zero-query initialization achieves strictly higher cosine similarity than random baselines is load-bearing for the novelty claim; if this follows only from the modeling choice rather than a parameter-free derivation, the practical improvements on real networks may not be theoretically grounded.

    Authors: The cosine-similarity guarantee is obtained directly from the framework via a parameter-free comparison of expected inner products: our initialization is constructed from the sign pattern implied by the theoretical model, while the random baseline is drawn uniformly from the sphere; the strict inequality follows from the geometry of the model without additional tunable parameters. We will revise the abstract and add a short clarifying paragraph in Section 3 to state this derivation explicitly and emphasize its parameter-free character. The extensive experiments across real networks, APIs, and non-image domains already provide empirical support that the theoretical advantage translates to practice. revision: yes

Circularity Check

0 steps flagged

No circularity: framework and guarantees presented as independent theoretical analysis

full rationale

The paper constructs a unified theoretical framework interpreting sign-flipping attacks as gradient-sign approximations, then states separate theoretical guarantees for zero-query initialization (higher cosine similarity) and PDO (lower query complexity). No quoted equations, definitions, or self-citations in the abstract or description reduce these guarantees to fitted inputs, self-definitional loops, or load-bearing prior self-work by construction. The derivation chain remains self-contained, with empirical validation on external datasets serving as independent check rather than internal fit.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only review limits visibility into parameters or entities; the central modeling choice is treated as an axiom and PDO is treated as a new algorithmic entity.

axioms (1)
  • domain assumption Existing sign-flipping hard-label attacks can be understood as approximating the true gradient sign.
    This modeling choice is the foundation of the unified theoretical framework described in the abstract.
invented entities (1)
  • Pattern-Driven Optimization (PDO) algorithm no independent evidence
    purpose: Achieve lower query complexity than baseline search methods in the attack optimization phase.
    New algorithmic module introduced by the paper.

pith-pipeline@v0.9.0 · 5778 in / 1345 out tokens · 39996 ms · 2026-05-25T07:09:56.484850+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages

  1. [1]

    Abdullah Al-Dujaili and Una-May O’Reilly. 2020. Sign bits are all you need for black-box attacks. InICLR

  2. [2]

    Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein. 2020. Square attack: a query-efficient black-box adversarial attack via random search. InECCV

  3. [3]

    Apple. 2023. Apple Core ML. https://developer.apple.com/cn/machine-learning/ core-ml/

  4. [4]

    https://cloud.baidu.com/doc/ IMAGERECOGNITION/index.html

    Baidu. 2024. Image Recognition. “https://cloud.baidu.com/doc/ IMAGERECOGNITION/index.html"

  5. [5]

    Andrei Barbu, David Mayo, Julian Alverio, William Luo, Christopher Wang, Dan Gutfreund, Josh Tenenbaum, and Boris Katz. 2019. Objectnet: A large-scale bias- controlled dataset for pushing the limits of object recognition models. InNeurIPS, Vol. 32

  6. [6]

    Wieland Brendel, Jonas Rauber, and Matthias Bethge. 2018. Decision-Based Ad- versarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models. InICLR

  7. [7]

    Jinghui Chen and Quanquan Gu. 2020. RayS: A Ray Searching Method for Hard-label Adversarial Attack. InKDD

  8. [8]

    Jianbo Chen, Michael I Jordan, and Martin J Wainwright. 2020. Hopskipjumpat- tack: A query-efficient decision-based attack. InIEEE S&P

  9. [9]

    Yiting Chen, Qibing Ren, and Junchi Yan. 2022. Rethinking and improving robustness of convolutional neural networks: a shapley value-based approach in frequency domain.NeurIPS(2022)

  10. [10]

    Minhao Cheng, Thong Le, Pin-Yu Chen, Huan Zhang, JinFeng Yi, and Cho-Jui Hsieh. 2019. Query-Efficient Hard-label Black-box Attack: An Optimization-based Approach. InICLR

  11. [11]

    Minhao Cheng, Simranjit Singh, Patrick Chen, Pin Yu Chen, Sijia Liu, and Cho Jui Hsieh. 2020. SIGN-OPT: A QUERY-EFFICIENT HARD-LABEL ADVERSARIAL ATTACK. InICLR

  12. [12]

    Minhao Cheng, Huan Zhang, Cho-Jui Hsieh, Thong Le, Pin-Yu Chen, and Jinfeng Yi. 2019. Query-efficient hard-label black-box attack: An optimization-based approach. InICLR

  13. [13]

    Francesco Croce, Maksym Andriushchenko, Vikash Sehwag, Edoardo Debenedetti, Nicolas Flammarion, Mung Chiang, Prateek Mittal, and Matthias Hein. 2021. RobustBench: a standardized adversarial robustness benchmark. In NeurIPS

  14. [14]

    Francesco Croce and Matthias Hein. 2020. Minimally Distorted Adversarial Examples with a Fast Adaptive Boundary Attack. InICML

  15. [15]

    Media Cybernetics. [n. d.]. Mediacy. https://mediacy.com/

  16. [16]

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. InCVPR. IEEE, 248–255

  17. [17]

    Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019. Arcface: Additive angular margin loss for deep face recognition. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4690–4699

  18. [18]

    Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li. 2018. Boosting Adversarial Attacks with Momentum. InCVPR

  19. [19]

    Logan Engstrom, Andrew Ilyas, Hadi Salman, Shibani Santurkar, and Dimitris Tsipras. 2019. Robustness. https://github.com/MadryLab/robustness

  20. [20]

    Zheng Fang, Tao Wang, Lingchen Zhao, Shenyi Zhang, Bowen Li, Yunjie Ge, Qi Li, Chao Shen, and Qian Wang. 2024. Zero-query adversarial attack on black-box automatic speech recognition systems. InACM CCS. 630–644

  21. [21]

    Gonzalez and Richard E

    Rafael C. Gonzalez and Richard E. Woods. 2018.Digital Image Processing(4th ed.). Pearson

  22. [22]

    https://cloud.google.com/vision?hl= en

    Google. [n. d.]. Google Cloud Vision API. “https://cloud.google.com/vision?hl= en"

  23. [23]

    Chuan Guo, Jacob Gardner, Yurong You, Andrew Gordon Wilson, and Kilian Weinberger. 2019. Simple black-box adversarial attacks. InICLR

  24. [24]

    Chuan Guo, Mayank Rana, Moustapha Cisse, and Laurens van der Maaten. 2018. Countering Adversarial Images using Input Transformations. InICLR

  25. [25]

    Dan Hendrycks and Thomas Dietterich. 2019. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations. InICLR

  26. [26]

    Goodfellow, Jonathon Shlens

    Christian Szegedy Ian J. Goodfellow, Jonathon Shlens. 2015. Explaining and Harnessing Adversarial Examples. InICLR

  27. [27]

    Ilharco, M

    Gabriel Ilharco, Mitchell Wortsman, Ross Wightman, Cade Gordon, Nicholas Carlini, Rohan Taori, Achal Dave, Vaishaal Shankar, Hongseok Namkoong, John Miller, Hannaneh Hajishirzi, Ali Farhadi, and Ludwig Schmidt. 2021.OpenCLIP. doi:10.5281/zenodo.5143773 If you use this software, please cite it as below

  28. [28]

    Andrew Ilyas, Logan Engstrom, and Aleksander Madry. 2019. Prior Convictions: Black-box Adversarial Attacks with Bandits and Priors. InICLR

  29. [29]

    https://imagga.com/solutions/ auto-tagging

    Imagga. [n. d.]. AI-Powered Image Tagging API. “https://imagga.com/solutions/ auto-tagging"

  30. [30]

    Shuaifan Jin, He Wang, Zhibo Wang, Feng Xiao, Jiahui Hu, Yuan He, Wenwen Zhang, Zhongjie Ba, Weijie Fang, Shuhong Yuan, et al. 2024. Defending Deep Learning-based Privacy Attacks with Gradient Descent-resistant Features in Face Recognition. InUSENIX

  31. [31]

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al

  32. [32]

    Segment anything. InICCV. 4015–4026

  33. [33]

    Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009)

  34. [34]

    Fengpeng Li, Kemou Li, Haiwei Wu, Jinyu Tian, and Jiantao Zhou. 2024. DAT: Im- proving Adversarial Robustness via Generative Amplitude Mix-up in Frequency Domain. InNeurIPS. 127099–127128

  35. [35]

    Fengpeng Li, Kemou Li, Haiwei Wu, Jinyu Tian, and Jiantao Zhou. 2025. Toward Robust Learning via Core Feature-Aware Adversarial Training.IEEE TIFS20 (2025), 6236–6251

  36. [36]

    Huiying Li, Shawn Shan, Emily Wenger, Jiayun Zhang, Haitao Zheng, and Ben Y Zhao. 2022. Blacklight: Scalable defense for neural networks against Query-Based Black-Box attacks. InUSENIX

  37. [37]

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. InECCV. Springer, 740–755

  38. [38]

    Chen Ma, Xiangyu Guo, Li Chen, Jun-Hai Yong, and Yisen Wang. 2021. Finding optimal tangent points for reducing distortions of hard-label attacks.NeurIPS (2021)

  39. [39]

    Thibault Maho, Teddy Furon, and Erwan Le Merrer. 2021. Surfree: a fast surrogate- free black-box attack. InCVPR

  40. [40]

    Seungyong Moon, Gaon An, and Hyun Oh Song. 2019. Parsimonious black-box adversarial attacks via efficient combinatorial optimization. InICLR

  41. [41]

    PyTorch. 2023. https://github.com/pytorch/vision

  42. [42]

    Meng Shen, Changyue Li, Qi Li, Hao Lu, Liehuang Zhu, and Ke Xu. 2024. Trans- ferability of white-box perturbations: query-efficient adversarial attacks against commercial DNN services. InUSENIX

  43. [43]

    https://cloud.tencent.com/ document/product/865/75196

    Tencent. 2024. Tencent General Image Label. “https://cloud.tencent.com/ document/product/865/75196"

  44. [44]

    Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. 2018. Ensemble Adversarial Training: Attacks and De- fenses. InICLR

  45. [45]

    Van Vleck and D

    J.H. Van Vleck and D. Middleton. 1966. The spectrum of clipped noise.Proc. IEEE 54, 1 (1966), 2–19. Conference’17, July 2017, Washington, DC, USA Trovato et al

  46. [46]

    Viet Quoc Vo, Ehsan Abbasnejad, and Damith C Ranasinghe. 2022. RamBoAttack: A Robust Query Efficient Deep Neural Network Decision Exploit. InNDSS

  47. [47]

    Gregory K Wallace. 1991. The JPEG still picture compression standard.Commun. ACM34, 4 (1991), 30–44

  48. [48]

    Jie Wan, Jianhao Fu, Lijin Wang, and Ziqi Yang. 2024. Bounceattack: A query- efficient decision-based adversarial attack by bouncing into the wild. InIEEE S&P

  49. [49]

    Feiyang Wang, Xingquan Zuo, Hai Huang, and Gang Chen. 2025. ADBA: Ap- proximation Decision Boundary Approach for Black-Box Adversarial Attacks. AAAI(2025)

  50. [50]

    Feiyang Wang, Xingquan Zuo, Hai Huang, and Gang Chen. 2025. TtBA: Two-third Bridge Approach for Decision-Based Adversarial Attack. InICML

  51. [51]

    Binyan Xu, Xilin Dai, Di Tang, and Kehuan Zhang. 2025. One Surrogate to Fool Them All: Universal, Transferable, and Targeted Adversarial Attacks with CLIP. InACM CCS. 3087–3101

  52. [52]

    Jiancheng Yang, Rui Shi, Donglai Wei, Zequan Liu, Lin Zhao, Bilian Ke, Hanspeter Pfister, and Bingbing Ni. 2023. Medmnist v2-a large-scale lightweight benchmark for 2d and 3d biomedical image classification.Scientific Data10, 1 (2023), 41

  53. [53]

    Dong Yin, Raphael Gontijo Lopes, Jon Shlens, Ekin Dogus Cubuk, and Justin Gilmer. 2019. A fourier perspective on model robustness in computer vision. NeurIPS32 (2019)

  54. [54]

    Zhang, Y

    H. Zhang, Y. Avrithis, T. Furon, and L. Amsaleg. 2021. Walking on the Edge: Fast, Low-Distortion Adversarial Examples.IEEE TIFS(2021)

  55. [55]

    Baolin Zheng, Peipei Jiang, Qian Wang, Qi Li, Chao Shen, Cong Wang, Yunjie Ge, Qingyang Teng, and Shenyi Zhang. 2021. Black-box adversarial attacks on commercial speech platforms with minimal information. InACM CCS. 86–107