Adversarial Logit Pairing

arxiv: 1803.06373 · v1 · pith:P2MLO5Y7new · submitted 2018-03-16 · 💻 cs.LG · stat.ML

Adversarial Logit Pairing

Harini Kannan , Alexey Kurakin , Ian Goodfellow This is my paper

classification 💻 cs.LG stat.ML

keywords adversariallogitpairingaccuracyexamplesimagenetstateattacks

0 comments p. Extension

pith:P2MLO5Y7 Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{P2MLO5Y7}

Prints a linked pith:P2MLO5Y7 badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

In this paper, we develop improved techniques for defending against adversarial examples at scale. First, we implement the state of the art version of adversarial training at unprecedented scale on ImageNet and investigate whether it remains effective in this setting - an important open scientific question (Athalye et al., 2018). Next, we introduce enhanced defenses using a technique we call logit pairing, a method that encourages logits for pairs of examples to be similar. When applied to clean examples and their adversarial counterparts, logit pairing improves accuracy on adversarial examples over vanilla adversarial training; we also find that logit pairing on clean examples only is competitive with adversarial training in terms of accuracy on two datasets. Finally, we show that adversarial logit pairing achieves the state of the art defense on ImageNet against PGD white box attacks, with an accuracy improvement from 1.5% to 27.9%. Adversarial logit pairing also successfully damages the current state of the art defense against black box attacks on ImageNet (Tramer et al., 2018), dropping its accuracy from 66.6% to 47.1%. With this new accuracy drop, adversarial logit pairing ties with Tramer et al.(2018) for the state of the art on black box attacks on ImageNet.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Margin-Adaptive Confidence Ranking for Reliable LLM Judgement
cs.LG 2026-05 unverdicted novelty 5.0

Introduces a margin-adaptive confidence ranking method that learns an estimator from simulated diversity and derives margin-dependent generalization bounds for use in fixed-sequence testing of LLM-human agreement.