What makes ImageNet good for transfer learning?
read the original abstract
The tremendous success of ImageNet-trained deep features on a wide range of transfer tasks begs the question: what are the properties of the ImageNet dataset that are critical for learning good, general-purpose features? This work provides an empirical investigation of various facets of this question: Is more pre-training data always better? How does feature quality depend on the number of training examples per class? Does adding more object classes improve performance? For the same data budget, how should the data be split into classes? Is fine-grained recognition necessary for learning good features? Given the same number of training classes, is it better to have coarse classes or fine-grained classes? Which is better: more classes or more examples per class? To answer these and related questions, we pre-trained CNN features on various subsets of the ImageNet dataset and evaluated transfer performance on PASCAL detection, PASCAL action classification, and SUN scene classification tasks. Our overall findings suggest that most changes in the choice of pre-training data long thought to be critical do not significantly affect transfer performance.? Given the same number of training classes, is it better to have coarse classes or fine-grained classes? Which is better: more classes or more examples per class?
This paper has not been read by Pith yet.
Forward citations
Cited by 5 Pith papers
-
Low Rank Adaptation for Adversarial Perturbation
Adversarial perturbations possess an inherently low-rank structure that enables more efficient and effective black-box adversarial attacks via subspace projection.
-
Decouple then Converge: Handling Unknown Unlabeled Distributions in Long-Tailed Semi-Supervised Learning
DeCon decouples LTSSL into head-class and tail-class branches that interact and converge, delivering SOTA accuracy on mismatched-distribution benchmarks and outperforming prior methods even on matched distributions.
-
A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark
VTAB is a 19-task benchmark that measures representation quality by few-shot adaptation performance across diverse vision domains, with a controlled large-scale comparison of popular pretraining methods.
-
How Class Ontology and Data Scale Affect Audio Transfer Learning
Larger pre-training data scale and class diversity improve audio transfer learning performance, yet similarity between pre-training and target task has a stronger positive effect.
-
Growing a Brain: Fine-Tuning by Increasing Model Capacity
Growing CNN capacity by widening or deepening layers with normalized new units outperforms standard fine-tuning on vision benchmarks.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.