Recognition: unknown
Purifying Adversarial Perturbation with Adversarially Trained Auto-encoders
classification
💻 cs.LG
cs.CRcs.CVstat.ML
keywords
adversarialtrainingmodelsotherattacksdirectlyexpensiveiterative
read the original abstract
Machine learning models are vulnerable to adversarial examples. Iterative adversarial training has shown promising results against strong white-box attacks. However, adversarial training is very expensive, and every time a model needs to be protected, such expensive training scheme needs to be performed. In this paper, we propose to apply iterative adversarial training scheme to an external auto-encoder, which once trained can be used to protect other models directly. We empirically show that our model outperforms other purifying-based methods against white-box attacks, and transfers well to directly protect other base models with different architectures.
This paper has not been read by Pith yet.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.