MNIST-C: A Robustness Benchmark for Computer Vision

Justin Gilmer; Norman Mu

arxiv: 1906.02337 · v1 · pith:KPOPMKYNnew · submitted 2019-06-05 · 💻 cs.CV · cs.LG

MNIST-C: A Robustness Benchmark for Computer Vision

Norman Mu , Justin Gilmer This is my paper

classification 💻 cs.CV cs.LG

keywords robustnesscomputercorruptionsmnist-cvisionadversarialbenchmarkdegrade

0 comments

read the original abstract

We introduce the MNIST-C dataset, a comprehensive suite of 15 corruptions applied to the MNIST test set, for benchmarking out-of-distribution robustness in computer vision. Through several experiments and visualizations we demonstrate that our corruptions significantly degrade performance of state-of-the-art computer vision models while preserving the semantic content of the test images. In contrast to the popular notion of adversarial robustness, our model-agnostic corruptions do not seek worst-case performance but are instead designed to be broad and diverse, capturing multiple failure modes of modern models. In fact, we find that several previously published adversarial defenses significantly degrade robustness as measured by MNIST-C. We hope that our benchmark serves as a useful tool for future work in designing systems that are able to learn robust feature representations that capture the underlying semantics of the input.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Empirical Insights of Test Selection Metrics under Multiple Testing Objectives and Distribution Shifts
cs.SE 2026-04 unverdicted novelty 6.0

A broad empirical benchmark shows how 15 existing test selection metrics perform for fault detection, performance estimation, and retraining under corrupted, adversarial, temporal, natural, and label shifts across ima...
Low Rank Based Subspace Inference for the Laplace Approximation of Bayesian Neural Networks
cs.LG 2025-02 unverdicted novelty 6.0

Derives optimal low-rank subspace for Laplace approx in BNNs, provides scalable outperforming version, and new comparison metric.