Develops the first adversarial robustness framework for one-stage learning-to-defer, including cost-sensitive surrogate losses and theoretical consistency guarantees for classification and regression.
Uncovering the limits of adversarial training against norm-bounded adversarial examples
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
CURE is the first multi-norm certified training method that improves union robustness across l_p norms and unseen perturbations on MNIST, CIFAR-10 and TinyImagenet.
A provable adversarial noise amplification theorem under sufficient conditions enables a custom-trained detector that identifies adversarial examples at inference time using enhanced layer-wise noise signals.
SAAD adaptively weights adversarial training samples by their transferability to the teacher, yielding higher AutoAttack robustness than prior distillation methods on CIFAR and Tiny-ImageNet without extra compute.
Nearest Neighbor Projection Removal Adversarial Training projects out inter-class dependencies in feature space during training, claims to reduce the Lipschitz constant and Rademacher complexity, and reports competitive robust accuracy on CIFAR-10, CIFAR-100, SVHN, and TinyImagenet.
TART improves clean accuracy in adversarial training by modulating perturbation bounds according to the tangential component of adversarial examples.
citing papers explorer
-
Adversarial Robustness in One-Stage Learning-to-Defer
Develops the first adversarial robustness framework for one-stage learning-to-defer, including cost-sensitive surrogate losses and theoretical consistency guarantees for classification and regression.
-
Towards Generalized Certified Robustness with Multi-Norm Training
CURE is the first multi-norm certified training method that improves union robustness across l_p norms and unseen perturbations on MNIST, CIFAR-10 and TinyImagenet.
-
Detecting Adversarial Data via Provable Adversarial Noise Amplification
A provable adversarial noise amplification theorem under sufficient conditions enables a custom-trained detector that identifies adversarial examples at inference time using enhanced layer-wise noise signals.
-
Sample-wise Adaptive Weighting for Transfer Consistency in Adversarial Distillation
SAAD adaptively weights adversarial training samples by their transferability to the teacher, yielding higher AutoAttack robustness than prior distillation methods on CIFAR and Tiny-ImageNet without extra compute.
-
Nearest Neighbor Projection Removal Adversarial Training
Nearest Neighbor Projection Removal Adversarial Training projects out inter-class dependencies in feature space during training, claims to reduce the Lipschitz constant and Rademacher complexity, and reports competitive robust accuracy on CIFAR-10, CIFAR-100, SVHN, and TinyImagenet.
-
Improving Clean Accuracy via a Tangent-Space Perspective on Adversarial Training
TART improves clean accuracy in adversarial training by modulating perturbation bounds according to the tangential component of adversarial examples.