pith. machine review for the scientific record. sign in

arxiv: 1412.5068 · v4 · submitted 2014-12-11 · 💻 cs.LG · cs.CV· cs.NE

Recognition: unknown

Towards Deep Neural Network Architectures Robust to Adversarial Examples

Authors on Pith no claims yet
classification 💻 cs.LG cs.CVcs.NE
keywords adversarialexamplesnetworkdeepcontractivedaesdnnsneural
0
0 comments X
read the original abstract

Recent work has shown deep neural networks (DNNs) to be highly susceptible to well-designed, small perturbations at the input layer, or so-called adversarial examples. Taking images as an example, such distortions are often imperceptible, but can result in 100% mis-classification for a state of the art DNN. We study the structure of adversarial examples and explore network topology, pre-processing and training strategies to improve the robustness of DNNs. We perform various experiments to assess the removability of adversarial examples by corrupting with additional noise and pre-processing with denoising autoencoders (DAEs). We find that DAEs can remove substantial amounts of the adversarial noise. How- ever, when stacking the DAE with the original DNN, the resulting network can again be attacked by new adversarial examples with even smaller distortion. As a solution, we propose Deep Contractive Network, a model with a new end-to-end training procedure that includes a smoothness penalty inspired by the contractive autoencoder (CAE). This increases the network robustness to adversarial examples, without a significant performance penalty.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Baseline Defenses for Adversarial Attacks Against Aligned Language Models

    cs.LG 2023-09 conditional novelty 6.0

    Baseline defenses including perplexity-based detection, input preprocessing, and adversarial training offer partial robustness to text adversarial attacks on LLMs, with challenges arising from weak discrete optimizers.