pith. sign in

arxiv: 2007.00644 · v2 · pith:74KU26LUnew · submitted 2020-07-01 · 💻 cs.LG · cs.CV· stat.ML

Measuring Robustness to Natural Distribution Shifts in Image Classification

classification 💻 cs.LG cs.CVstat.ML
keywords distributionrobustnessnaturalshiftsarisingcurrentdatashift
0
0 comments X
read the original abstract

We study how robust current ImageNet models are to distribution shifts arising from natural variations in datasets. Most research on robustness focuses on synthetic image perturbations (noise, simulated weather artifacts, adversarial examples, etc.), which leaves open how robustness on synthetic distribution shift relates to distribution shift arising in real data. Informed by an evaluation of 204 ImageNet models in 213 different test conditions, we find that there is often little to no transfer of robustness from current synthetic to natural distribution shift. Moreover, most current techniques provide no robustness to the natural distribution shifts in our testbed. The main exception is training on larger and more diverse datasets, which in multiple cases increases robustness, but is still far from closing the performance gaps. Our results indicate that distribution shifts arising in real data are currently an open research problem. We provide our testbed and data as a resource for future work at https://modestyachts.github.io/imagenet-testbed/ .

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Toward Calibrated, Fair, and accurate Deepfake Detection

    cs.LG 2026-06 unverdicted novelty 7.0

    Face-Feature Tuning is a label-free logit remapping method that reduces FPR/TPR gaps across groups in deepfake detection while preserving overall accuracy.

  2. LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws

    cs.LG 2025-02 unverdicted novelty 6.0

    Pretraining data determines loss-to-loss scaling laws in LLMs, while model size, optimization, tokenizer, and architecture have limited impact.