Detecting Adversarial Samples from Artifacts

Andrew B. Gardner; Reuben Feinman; Ryan R. Curtin; Saurabh Shintre

Detecting Adversarial Samples from Artifacts

Not yet reviewed by Pith; the record is open.

Re-run · record.json Download PDF Read on arXiv ↗

This paper has not been read by Pith yet. Machine review is queued; the pith claim, tier, and objections will appear here once it completes.

SPECIMEN: schema-true, not a live event

T0 review · schema-true

One-sentence machine reading of the paper's core claim.

pith:XXXXXXXX · record.json · timestamp

arxiv 1703.00410 v3 pith:JUHND5XE submitted 2017-03-01 stat.ML cs.LG

Detecting Adversarial Samples from Artifacts

Reuben Feinman , Ryan R. Curtin , Saurabh Shintre , Andrew B. Gardner This is my paper

classification stat.ML cs.LG

keywords adversarialsamplesmodelarchitecturesdeepinputmethodnetworks

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

0 comments

read the original abstract

Deep neural networks (DNNs) are powerful nonlinear architectures that are known to be robust to random perturbations of the input. However, these models are vulnerable to adversarial perturbations--small input changes crafted explicitly to fool the model. In this paper, we ask whether a DNN can distinguish adversarial samples from their normal and noisy counterparts. We investigate model confidence on adversarial samples by looking at Bayesian uncertainty estimates, available in dropout neural networks, and by performing density estimation in the subspace of deep features learned by the model. The result is a method for implicit adversarial detection that is oblivious to the attack algorithm. We evaluate this method on a variety of standard datasets including MNIST and CIFAR-10 and show that it generalizes well across different architectures and attacks. Our findings report that 85-93% ROC-AUC can be achieved on a number of standard classification tasks with a negative class that consists of both normal and noisy samples.

discussion (0)

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Classifier-Agnostic Zero-Shot Adversarial Attack Detection via CLIP
cs.CV 2026-06 unverdicted novelty 7.0

A^4D is a classifier- and attack-agnostic zero-shot adversarial attack detector based on CLIP embedding shifts that claims SOTA performance.
A Classifier-Agnostic Zero-Shot Adversarial Attack Detection via CLIP
cs.CV 2026-06 unverdicted novelty 7.0

A^4D detects adversarial attacks in an attack- and classifier-agnostic way by measuring non-arbitrary shifts in CLIP embedding space from prompt-based similarity scores.
DPAgent-in-the-Middle: Agentic Defense and Repair Against AI-Groomed Deceptive Patterns
cs.CR 2026-06 unverdicted novelty 7.0

DPAgent is an agentic framework that detects 90.98% of AI-groomed deceptive samples and repairs 77% of deceptive interfaces while exploring 80% of pattern types with 10% of baseline page visits.
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models
cs.CV 2024-06 unverdicted novelty 7.0

MirrorCheck detects adversarial attacks on VLMs via T2I regeneration for semantic consistency checks, using stochastic model selection and one-time perturbations for robustness against adaptive attacks.
Stateful Detection of Black-Box Adversarial Attacks
cs.CR 2019-07 unverdicted novelty 7.0

The paper argues for stateful defenses over stateless ones to detect adversarial example generation via query history and introduces query blinding as a counter-attack.
AdvScan: Black-Box Adversarial Example Detection at Runtime through Power Analysis
cs.CR 2026-06 unverdicted novelty 6.0

AdvScan detects adversarial examples in black-box TinyML on ARM Cortex-M devices via one-sample t-test on runtime power signatures against a benign baseline, reporting 99.984% detection with 40 false negatives and zer...
Spectrally unstable nodes drive reliability failures in graph learning
cs.LG 2024-12 unverdicted novelty 5.0

Spectrally unstable nodes are identified via graph-spectral distortion analysis as primary drivers of reliability failures; isolating them yields a stable subgraph for learning with propagation-based recovery for the ...
AEGIS: A Semantic GAN and Evidential Learning Frameworkfor Robust Adversarial Detection in Vision Sensors
cs.CV 2026-06 unverdicted novelty 4.0

AEGIS combines SemantiGAN filtering with evidential learning on five handcrafted instability metrics to detect adversarial attacks, reporting 92.1% AUROC on Tiny ImageNet across six attack types.