pith. machine review for the scientific record. sign in

arxiv: 1706.00384 · v1 · submitted 2017-06-01 · 💻 cs.CV

Recognition: unknown

Deep Mutual Learning

Authors on Pith no claims yet
classification 💻 cs.CV
keywords networklearningmutualteacherpowerfulstudenttransferdeep
0
0 comments X
read the original abstract

Model distillation is an effective and widely used technique to transfer knowledge from a teacher to a student network. The typical application is to transfer from a powerful large network or ensemble to a small network, that is better suited to low-memory or fast execution requirements. In this paper, we present a deep mutual learning (DML) strategy where, rather than one way transfer between a static pre-defined teacher and a student, an ensemble of students learn collaboratively and teach each other throughout the training process. Our experiments show that a variety of network architectures benefit from mutual learning and achieve compelling results on CIFAR-100 recognition and Market-1501 person re-identification benchmarks. Surprisingly, it is revealed that no prior powerful teacher network is necessary -- mutual learning of a collection of simple student networks works, and moreover outperforms distillation from a more powerful yet static teacher.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. OneSearch-V2: The Latent Reasoning Enhanced Self-distillation Generative Search Framework

    cs.IR 2026-03 unverdicted novelty 4.0

    OneSearch-V2 improves generative retrieval via latent reasoning and self-distillation, achieving +3.98% item CTR, +2.07% buyer volume, and +2.11% order volume in online A/B tests.