pith. sign in

arxiv: 1902.06894 · v4 · pith:ZE2EZ6BMnew · submitted 2019-02-19 · 💻 cs.LG · cs.CR· stat.ML

There are No Bit Parts for Sign Bits in Black-Box Attacks

classification 💻 cs.LG cs.CRstat.ML
keywords algorithmattacksblack-boxmodelattackqueriesadversarialdatasets
0
0 comments X
read the original abstract

We present a black-box adversarial attack algorithm which sets new state-of-the-art model evasion rates for query efficiency in the $\ell_\infty$ and $\ell_2$ metrics, where only loss-oracle access to the model is available. On two public black-box attack challenges, the algorithm achieves the highest evasion rate, surpassing all of the submitted attacks. Similar performance is observed on a model that is secure against substitute-model attacks. For standard models trained on the MNIST, CIFAR10, and IMAGENET datasets, averaged over the datasets and metrics, the algorithm is 3.8x less failure-prone, and spends in total 2.5x fewer queries than the current state-of-the-art attacks combined given a budget of 10, 000 queries per attack attempt. Notably, it requires no hyperparameter tuning or any data/time-dependent prior. The algorithm exploits a new approach, namely sign-based rather than magnitude-based gradient estimation. This shifts the estimation from continuous to binary black-box optimization. With three properties of the directional derivative, we examine three approaches to adversarial attacks. This yields a superior algorithm breaking a standard MNIST model using just 12 queries on average!

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.