pith. sign in

arxiv: 1905.09186 · v1 · pith:5GKE74PFnew · submitted 2019-05-22 · 💻 cs.LG · cs.CR· stat.ML

Detecting Adversarial Examples and Other Misclassifications in Neural Networks by Introspection

classification 💻 cs.LG cs.CRstat.ML
keywords neuralmisclassificationsnetworkadversarialconfidencedetectintrospectionnetworks
0
0 comments X
read the original abstract

Despite having excellent performances for a wide variety of tasks, modern neural networks are unable to provide a reliable confidence value allowing to detect misclassifications. This limitation is at the heart of what is known as an adversarial example, where the network provides a wrong prediction associated with a strong confidence to a slightly modified image. Moreover, this overconfidence issue has also been observed for regular errors and out-of-distribution data. We tackle this problem by what we call introspection, i.e. using the information provided by the logits of an already pretrained neural network. We show that by training a simple 3-layers neural network on top of the logit activations, we are able to detect misclassifications at a competitive level.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. LogitDynamics: Reliable ViT Error Detection from Layerwise Logit Trajectories

    cs.CV 2026-04 unverdicted novelty 6.0

    LogitDynamics detects errors in ViT predictions by training a linear probe on layerwise logit values and top-class instability statistics extracted via auxiliary heads.