What makes ImageNet good for transfer learning?

Alexei A. Efros; Minyoung Huh; Pulkit Agrawal

arxiv: 1608.08614 · v2 · pith:IXRWK2OSnew · submitted 2016-08-30 · 💻 cs.CV · cs.AI· cs.LG

What makes ImageNet good for transfer learning?

Minyoung Huh , Pulkit Agrawal , Alexei A. Efros This is my paper

classification 💻 cs.CV cs.AIcs.LG

keywords classesbetterdatafeaturestransferclassexamplesfine-grained

0 comments

read the original abstract

The tremendous success of ImageNet-trained deep features on a wide range of transfer tasks begs the question: what are the properties of the ImageNet dataset that are critical for learning good, general-purpose features? This work provides an empirical investigation of various facets of this question: Is more pre-training data always better? How does feature quality depend on the number of training examples per class? Does adding more object classes improve performance? For the same data budget, how should the data be split into classes? Is fine-grained recognition necessary for learning good features? Given the same number of training classes, is it better to have coarse classes or fine-grained classes? Which is better: more classes or more examples per class? To answer these and related questions, we pre-trained CNN features on various subsets of the ImageNet dataset and evaluated transfer performance on PASCAL detection, PASCAL action classification, and SUN scene classification tasks. Our overall findings suggest that most changes in the choice of pre-training data long thought to be critical do not significantly affect transfer performance.? Given the same number of training classes, is it better to have coarse classes or fine-grained classes? Which is better: more classes or more examples per class?

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Low Rank Adaptation for Adversarial Perturbation
cs.LG 2026-04 unverdicted novelty 7.0

Adversarial perturbations possess an inherently low-rank structure that enables more efficient and effective black-box adversarial attacks via subspace projection.
Decouple then Converge: Handling Unknown Unlabeled Distributions in Long-Tailed Semi-Supervised Learning
cs.LG 2024-06 unverdicted novelty 6.0

DeCon decouples LTSSL into head-class and tail-class branches that interact and converge, delivering SOTA accuracy on mismatched-distribution benchmarks and outperforming prior methods even on matched distributions.
A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark
cs.CV 2019-10 accept novelty 6.0

VTAB is a 19-task benchmark that measures representation quality by few-shot adaptation performance across diverse vision domains, with a controlled large-scale comparison of popular pretraining methods.
How Class Ontology and Data Scale Affect Audio Transfer Learning
cs.LG 2026-03 unverdicted novelty 5.0

Larger pre-training data scale and class diversity improve audio transfer learning performance, yet similarity between pre-training and target task has a stronger positive effect.
Growing a Brain: Fine-Tuning by Increasing Model Capacity
cs.CV 2019-07 unverdicted novelty 5.0

Growing CNN capacity by widening or deepening layers with normalized new units outperforms standard fine-tuning on vision benchmarks.