Transferable Perturbations of Deep Feature Distributions

Kevin J Liang; Lawrence Carin; Nathan Inkawhich; Yiran Chen

arxiv: 2004.12519 · v1 · pith:OFE3F7H3new · submitted 2020-04-27 · 💻 cs.LG · stat.ML

Transferable Perturbations of Deep Feature Distributions

Nathan Inkawhich , Kevin J Liang , Lawrence Carin , Yiran Chen This is my paper

classification 💻 cs.LG stat.ML

keywords adversarialfeaturedistributionsattackattacksclass-wisedeeplayer-wise

0 comments

read the original abstract

Almost all current adversarial attacks of CNN classifiers rely on information derived from the output layer of the network. This work presents a new adversarial attack based on the modeling and exploitation of class-wise and layer-wise deep feature distributions. We achieve state-of-the-art targeted blackbox transfer-based attack results for undefended ImageNet models. Further, we place a priority on explainability and interpretability of the attacking process. Our methodology affords an analysis of how adversarial attacks change the intermediate feature distributions of CNNs, as well as a measure of layer-wise and class-wise feature distributional separability/entanglement. We also conceptualize a transition from task/data-specific to model-specific features within a CNN architecture that directly impacts the transferability of adversarial examples.

This paper has not been read by Pith yet.

Transferable Perturbations of Deep Feature Distributions

discussion (0)