Adversarial Logit Pairing

· 2018 · cs.LG · arXiv 1803.06373

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

In this paper, we develop improved techniques for defending against adversarial examples at scale. First, we implement the state of the art version of adversarial training at unprecedented scale on ImageNet and investigate whether it remains effective in this setting - an important open scientific question (Athalye et al., 2018). Next, we introduce enhanced defenses using a technique we call logit pairing, a method that encourages logits for pairs of examples to be similar. When applied to clean examples and their adversarial counterparts, logit pairing improves accuracy on adversarial examples over vanilla adversarial training; we also find that logit pairing on clean examples only is competitive with adversarial training in terms of accuracy on two datasets. Finally, we show that adversarial logit pairing achieves the state of the art defense on ImageNet against PGD white box attacks, with an accuracy improvement from 1.5% to 27.9%. Adversarial logit pairing also successfully damages the current state of the art defense against black box attacks on ImageNet (Tramer et al., 2018), dropping its accuracy from 66.6% to 47.1%. With this new accuracy drop, adversarial logit pairing ties with Tramer et al.(2018) for the state of the art on black box attacks on ImageNet.

representative citing papers

Margin-Adaptive Confidence Ranking for Reliable LLM Judgement

cs.LG · 2026-05-14 · unverdicted · novelty 5.0

Introduces a margin-adaptive confidence ranking method that learns an estimator from simulated diversity and derives margin-dependent generalization bounds for use in fixed-sequence testing of LLM-human agreement.

Invariance-inducing regularization using worst-case transformations suffices to boost accuracy and spatial robustness

cs.LG · 2019-06-26 · unverdicted · novelty 5.0

Invariance-inducing regularization using worst-case transformations reduces relative error by 20% on CIFAR10 transformed examples, improves standard accuracy on SVHN, outperforms equivariant networks, and proves no accuracy-robustness trade-off in the infinite data limit.

Using Intuition from Empirical Properties to Simplify Adversarial Training Defense

cs.LG · 2019-06-27 · unverdicted · novelty 4.0

Modifications to single-step adversarial training based on empirical properties of iterative methods improve accuracy by up to 16.93% against iterative attacks while reducing training cost by 28.75%.

citing papers explorer

Showing 3 of 3 citing papers.

Margin-Adaptive Confidence Ranking for Reliable LLM Judgement cs.LG · 2026-05-14 · unverdicted · none · ref 42 · internal anchor
Introduces a margin-adaptive confidence ranking method that learns an estimator from simulated diversity and derives margin-dependent generalization bounds for use in fixed-sequence testing of LLM-human agreement.
Invariance-inducing regularization using worst-case transformations suffices to boost accuracy and spatial robustness cs.LG · 2019-06-26 · unverdicted · none · ref 21 · internal anchor
Invariance-inducing regularization using worst-case transformations reduces relative error by 20% on CIFAR10 transformed examples, improves standard accuracy on SVHN, outperforms equivariant networks, and proves no accuracy-robustness trade-off in the infinite data limit.
Using Intuition from Empirical Properties to Simplify Adversarial Training Defense cs.LG · 2019-06-27 · unverdicted · none · ref 6 · internal anchor
Modifications to single-step adversarial training based on empirical properties of iterative methods improve accuracy by up to 16.93% against iterative attacks while reducing training cost by 28.75%.

Adversarial Logit Pairing

fields

years

verdicts

representative citing papers

citing papers explorer