pith. machine review for the scientific record. sign in

arxiv: 1901.08573 · v3 · submitted 2019-01-24 · 💻 cs.LG · stat.ML

Recognition: unknown

Theoretically Principled Trade-off between Robustness and Accuracy

Authors on Pith no claims yet
classification 💻 cs.LG stat.ML
keywords adversarialerroraccuracyrobustnesstrade-offbounddesignexamples
0
0 comments X
read the original abstract

We identify a trade-off between robustness and accuracy that serves as a guiding principle in the design of defenses against adversarial examples. Although this problem has been widely studied empirically, much remains unknown concerning the theory underlying this trade-off. In this work, we decompose the prediction error for adversarial examples (robust error) as the sum of the natural (classification) error and boundary error, and provide a differentiable upper bound using the theory of classification-calibrated loss, which is shown to be the tightest possible upper bound uniform over all probability distributions and measurable predictors. Inspired by our theoretical analysis, we also design a new defense method, TRADES, to trade adversarial robustness off against accuracy. Our proposed algorithm performs well experimentally in real-world datasets. The methodology is the foundation of our entry to the NeurIPS 2018 Adversarial Vision Challenge in which we won the 1st place out of ~2,000 submissions, surpassing the runner-up approach by $11.41\%$ in terms of mean $\ell_2$ perturbation distance.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Laundering AI Authority with Adversarial Examples

    cs.CR 2026-05 unverdicted novelty 5.0

    Adversarial examples enable AI authority laundering by causing production VLMs to give authoritative but wrong responses on subtly perturbed images, with success rates of 22-100% using decade-old attack methods.

  2. Auto-ART: Structured Literature Synthesis and Automated Adversarial Robustness Testing

    cs.CR 2026-04 unverdicted novelty 5.0

    Auto-ART delivers the first structured synthesis of adversarial robustness consensus plus an executable multi-norm testing framework that flags gradient masking in 92% of cases on RobustBench and reveals a 23.5 pp rob...