pith. sign in

Theoretically Principled Trade-off between Robustness and Accuracy

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it
abstract

We identify a trade-off between robustness and accuracy that serves as a guiding principle in the design of defenses against adversarial examples. Although this problem has been widely studied empirically, much remains unknown concerning the theory underlying this trade-off. In this work, we decompose the prediction error for adversarial examples (robust error) as the sum of the natural (classification) error and boundary error, and provide a differentiable upper bound using the theory of classification-calibrated loss, which is shown to be the tightest possible upper bound uniform over all probability distributions and measurable predictors. Inspired by our theoretical analysis, we also design a new defense method, TRADES, to trade adversarial robustness off against accuracy. Our proposed algorithm performs well experimentally in real-world datasets. The methodology is the foundation of our entry to the NeurIPS 2018 Adversarial Vision Challenge in which we won the 1st place out of ~2,000 submissions, surpassing the runner-up approach by $11.41\%$ in terms of mean $\ell_2$ perturbation distance.

citation-role summary

background 1

citation-polarity summary

fields

cs.CR 3 cs.LG 3

years

2026 5 2019 1

verdicts

UNVERDICTED 6

roles

background 1

polarities

background 1

clear filters

representative citing papers

Landseer: Exploring the Machine Learning Defense Landscape

cs.CR · 2026-05-26 · unverdicted · novelty 6.0

Landseer offers a containerized modular system to integrate and evaluate combinations of machine learning defenses, with an initial analysis of 35 defenses highlighting replicability challenges.

SORA: Free Second-Order Attacks in Fast Adversarial Training

cs.LG · 2026-05-30 · unverdicted · novelty 5.0

SORA is an adaptive step-size adversarial training algorithm that formalizes epsilon overfitting, introduces the PertAlign metric to predict catastrophic overfitting, and dynamically adjusts perturbations to achieve state-of-the-art robustness and clean accuracy with fixed hyperparameters.

Laundering AI Authority with Adversarial Examples

cs.CR · 2026-05-05 · unverdicted · novelty 5.0

Adversarial examples enable AI authority laundering by causing production VLMs to give authoritative but wrong responses on subtly perturbed images, with success rates of 22-100% using decade-old attack methods.

citing papers explorer

Showing 5 of 5 citing papers after filters.

  • Landseer: Exploring the Machine Learning Defense Landscape cs.CR · 2026-05-26 · unverdicted · none · ref 121 · internal anchor

    Landseer offers a containerized modular system to integrate and evaluate combinations of machine learning defenses, with an initial analysis of 35 defenses highlighting replicability challenges.

  • A combination of noise and bilateral filters achieve supralinear and scalable adversarial robustness in CNNs cs.LG · 2026-06-01 · unverdicted · none · ref 57 · internal anchor

    A preprocessor of Gaussian noise plus bilateral filtering yields supralinear adversarial robustness in CNNs and, when paired with adversarial training, ranks near the top of RobustBench while using far less compute, parameters, epochs, and data than prior defenses.

  • SORA: Free Second-Order Attacks in Fast Adversarial Training cs.LG · 2026-05-30 · unverdicted · none · ref 9 · internal anchor

    SORA is an adaptive step-size adversarial training algorithm that formalizes epsilon overfitting, introduces the PertAlign metric to predict catastrophic overfitting, and dynamically adjusts perturbations to achieve state-of-the-art robustness and clean accuracy with fixed hyperparameters.

  • Laundering AI Authority with Adversarial Examples cs.CR · 2026-05-05 · unverdicted · none · ref 71

    Adversarial examples enable AI authority laundering by causing production VLMs to give authoritative but wrong responses on subtly perturbed images, with success rates of 22-100% using decade-old attack methods.

  • Auto-ART: Structured Literature Synthesis and Automated Adversarial Robustness Testing cs.CR · 2026-04-22 · unverdicted · none · ref 7

    Auto-ART delivers the first structured synthesis of adversarial robustness consensus plus an executable multi-norm testing framework that flags gradient masking in 92% of cases on RobustBench and reveals a 23.5 pp robustness gap.