MNIST-C: A Robustness Benchmark for Computer Vision
read the original abstract
We introduce the MNIST-C dataset, a comprehensive suite of 15 corruptions applied to the MNIST test set, for benchmarking out-of-distribution robustness in computer vision. Through several experiments and visualizations we demonstrate that our corruptions significantly degrade performance of state-of-the-art computer vision models while preserving the semantic content of the test images. In contrast to the popular notion of adversarial robustness, our model-agnostic corruptions do not seek worst-case performance but are instead designed to be broad and diverse, capturing multiple failure modes of modern models. In fact, we find that several previously published adversarial defenses significantly degrade robustness as measured by MNIST-C. We hope that our benchmark serves as a useful tool for future work in designing systems that are able to learn robust feature representations that capture the underlying semantics of the input.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
Empirical Insights of Test Selection Metrics under Multiple Testing Objectives and Distribution Shifts
A broad empirical benchmark shows how 15 existing test selection metrics perform for fault detection, performance estimation, and retraining under corrupted, adversarial, temporal, natural, and label shifts across ima...
-
Low Rank Based Subspace Inference for the Laplace Approximation of Bayesian Neural Networks
Derives optimal low-rank subspace for Laplace approx in BNNs, provides scalable outperforming version, and new comparison metric.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.