Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks
read the original abstract
Although deep neural networks (DNNs) have achieved great success in many tasks, they can often be fooled by \emph{adversarial examples} that are generated by adding small but purposeful distortions to natural examples. Previous studies to defend against adversarial examples mostly focused on refining the DNN models, but have either shown limited success or required expensive computation. We propose a new strategy, \emph{feature squeezing}, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model's prediction on the original input with that on squeezed inputs, feature squeezing detects adversarial examples with high accuracy and few false positives. This paper explores two feature squeezing methods: reducing the color bit depth of each pixel and spatial smoothing. These simple strategies are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks.
This paper has not been read by Pith yet.
Forward citations
Cited by 13 Pith papers
-
Adversarial Hubness in Multi-Modal Retrieval
Adversarial hubs can be generated to be retrieved as top-1 for over 84% of test queries in text-to-image retrieval, far exceeding natural hubs.
-
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models
MirrorCheck detects adversarial attacks on VLMs via T2I regeneration for semantic consistency checks, using stochastic model selection and one-time perturbations for robustness against adaptive attacks.
-
Stateful Detection of Black-Box Adversarial Attacks
The paper argues for stateful defenses over stateless ones to detect adversarial example generation via query history and introduces query blinding as a counter-attack.
-
Physical Adversarial Clothing Evades Visible-Thermal Detectors via Non-Overlapping RGB-T Pattern
Non-overlapping RGB-T adversarial patterns on clothing, optimized with spatial discrete-continuous optimization, achieve high attack success rates against multiple RGB-T detector fusion architectures in both digital a...
-
Detecting Adversarial Data via Provable Adversarial Noise Amplification
A provable adversarial noise amplification theorem under sufficient conditions enables a custom-trained detector that identifies adversarial examples at inference time using enhanced layer-wise noise signals.
-
TRAP: Tail-aware Ranking Attack for World-Model Planning
TRAP is a tail-aware ranking attack that plants a backdoor in world models so that a trigger causes the model to reorder a few critical imagined trajectories and redirect planning while preserving normal behavior on c...
-
ORCA: An Agentic Reasoning Framework for Hallucination and Adversarial Robustness in Vision-Language Models
ORCA is an agentic reasoning framework that enhances factual accuracy and adversarial robustness of pretrained LVLMs via an Observe-Reason-Critique-Act loop with small vision models, reporting accuracy gains of up to ...
-
ORCA: An Agentic Reasoning Framework for Hallucination and Adversarial Robustness in Vision-Language Models
ORCA is an inference-time agentic framework that boosts LVLM accuracy on hallucination benchmarks by 3.64-40.67% and adds adversarial robustness via cross-model validation with small vision tools.
-
Dithering Defense: Adversarial Robustness of Vision Foundation Models via Multi-Level Floyd-Steinberg Dithering
Multi-level Floyd-Steinberg dithering defends DINOv2 and PaliGemma models against PGD, MI-FGSM and SIA attacks on six tasks while causing less clean-input degradation than diffusion denoising or other baselines.
-
Margin-Adaptive Confidence Ranking for Reliable LLM Judgement
Introduces a margin-adaptive confidence ranking method that learns an estimator from simulated diversity and derives margin-dependent generalization bounds for use in fixed-sequence testing of LLM-human agreement.
-
Breaking the Illusion: Consensus-Based Generative Mitigation of Adversarial Illusions in Multi-Modal Embeddings
Generative purification with consensus aggregation reduces adversarial illusion attack success rates to near zero on ImageBind while improving alignment on both clean and attacked inputs.
-
Survival of the Cheapest: Cost-Aware Hardware Adaptation for Adversarial Robustness
A decision-support framework applies AFT models to show Nvidia L4 GPUs yield 20% longer adversarial survival time at 75% lower cost than V100, with inference latency as the strongest robustness predictor.
-
Comparative Insights on Adversarial Machine Learning from Industry and Academia: A User-Study Approach
Surveys indicate cybersecurity education correlates with higher concern for adversarial ML threats, and CTF-based challenges effectively engage students in learning about data poisoning attacks.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.