pith. machine review for the scientific record. sign in

arxiv: 1511.07528 · v1 · submitted 2015-11-24 · 💻 cs.CR · cs.LG· cs.NE· stat.ML

Recognition: unknown

The Limitations of Deep Learning in Adversarial Settings

Authors on Pith no claims yet
classification 💻 cs.CR cs.LGcs.NEstat.ML
keywords adversarialdeepsamplesalgorithmslearningnetworksneuraladversaries
0
0 comments X
read the original abstract

Deep learning takes advantage of large datasets and computationally efficient training algorithms to outperform other approaches at various machine learning tasks. However, imperfections in the training phase of deep neural networks make them vulnerable to adversarial samples: inputs crafted by adversaries with the intent of causing deep neural networks to misclassify. In this work, we formalize the space of adversaries against deep neural networks (DNNs) and introduce a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs. In an application to computer vision, we show that our algorithms can reliably produce samples correctly classified by human subjects but misclassified in specific targets by a DNN with a 97% adversarial success rate while only modifying on average 4.02% of the input features per sample. We then evaluate the vulnerability of different sample classes to adversarial perturbations by defining a hardness measure. Finally, we describe preliminary work outlining defenses against adversarial samples by defining a predictive measure of distance between a benign input and a target classification.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Out-of-the-box: Black-box Causal Attacks on Object Detectors

    cs.CV 2025-12 unverdicted novelty 6.0

    BlackCAtt creates smaller, explainable black-box attacks on object detectors by targeting minimal causal pixel sets, outperforming or matching standard methods and acting as a meta-algorithm when combined with them.