Why M Heads are Better than One: Training a Diverse Ensemble of Deep Networks
read the original abstract
Convolutional Neural Networks have achieved state-of-the-art performance on a wide range of tasks. Most benchmarks are led by ensembles of these powerful learners, but ensembling is typically treated as a post-hoc procedure implemented by averaging independently trained models with model variation induced by bagging or random initialization. In this paper, we rigorously treat ensembling as a first-class problem to explicitly address the question: what are the best strategies to create an ensemble? We first compare a large number of ensembling strategies, and then propose and evaluate novel strategies, such as parameter sharing (through a new family of models we call TreeNets) as well as training under ensemble-aware and diversity-encouraging losses. We demonstrate that TreeNets can improve ensemble performance and that diverse ensembles can be trained end-to-end under a unified loss, achieving significantly higher "oracle" accuracies than classical ensembles.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
Anatomy of a failure: When, how, and why deep vision fails in scientific domains
Deep learning on information-rich scientific images collapses to one-dimensional predictions due to a mismatch between data priors and the model's simplicity bias, even after robustification techniques.
-
As easy as 1, 2... 4? Uncertainty in counting tasks for medical imaging
A multi-task network is introduced to generate narrow predictive intervals for counts in medical images while maintaining target coverage, tested on cell and white matter hyperintensity counting.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.