pith. sign in

arxiv: 1703.00410 · v3 · pith:JUHND5XEnew · submitted 2017-03-01 · 📊 stat.ML · cs.LG

Detecting Adversarial Samples from Artifacts

classification 📊 stat.ML cs.LG
keywords adversarialsamplesmodelarchitecturesdeepinputmethodnetworks
0
0 comments X
read the original abstract

Deep neural networks (DNNs) are powerful nonlinear architectures that are known to be robust to random perturbations of the input. However, these models are vulnerable to adversarial perturbations--small input changes crafted explicitly to fool the model. In this paper, we ask whether a DNN can distinguish adversarial samples from their normal and noisy counterparts. We investigate model confidence on adversarial samples by looking at Bayesian uncertainty estimates, available in dropout neural networks, and by performing density estimation in the subspace of deep features learned by the model. The result is a method for implicit adversarial detection that is oblivious to the attack algorithm. We evaluate this method on a variety of standard datasets including MNIST and CIFAR-10 and show that it generalizes well across different architectures and attacks. Our findings report that 85-93% ROC-AUC can be achieved on a number of standard classification tasks with a negative class that consists of both normal and noisy samples.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. MirrorCheck: Efficient Adversarial Defense for Vision-Language Models

    cs.CV 2024-06 unverdicted novelty 7.0

    MirrorCheck detects adversarial attacks on VLMs via T2I regeneration for semantic consistency checks, using stochastic model selection and one-time perturbations for robustness against adaptive attacks.

  2. Stateful Detection of Black-Box Adversarial Attacks

    cs.CR 2019-07 unverdicted novelty 7.0

    The paper argues for stateful defenses over stateless ones to detect adversarial example generation via query history and introduces query blinding as a counter-attack.

  3. Spectrally unstable nodes drive reliability failures in graph learning

    cs.LG 2024-12 unverdicted novelty 5.0

    Spectrally unstable nodes are identified via graph-spectral distortion analysis as primary drivers of reliability failures; isolating them yields a stable subgraph for learning with propagation-based recovery for the ...