pith. machine review for the scientific record. sign in

arxiv: 1806.04606 · v2 · submitted 2018-06-12 · 💻 cs.CV

Recognition: unknown

Knowledge Distillation by On-the-Fly Native Ensemble

Authors on Pith no claims yet
classification 💻 cs.CV
keywords distillationknowledgenetworkteacherensemblemethodsnativeon-the-fly
0
0 comments X
read the original abstract

Knowledge distillation is effective to train small and generalisable network models for meeting the low-memory and fast running requirements. Existing offline distillation methods rely on a strong pre-trained teacher, which enables favourable knowledge discovery and transfer but requires a complex two-phase training procedure. Online counterparts address this limitation at the price of lacking a highcapacity teacher. In this work, we present an On-the-fly Native Ensemble (ONE) strategy for one-stage online distillation. Specifically, ONE trains only a single multi-branch network while simultaneously establishing a strong teacher on-the- fly to enhance the learning of target network. Extensive evaluations show that ONE improves the generalisation performance a variety of deep neural networks more significantly than alternative methods on four image classification dataset: CIFAR10, CIFAR100, SVHN, and ImageNet, whilst having the computational efficiency advantages.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.