pith. sign in

arxiv: 2106.01489 · v3 · pith:GYFDKWM3new · submitted 2021-06-02 · 💻 cs.LG · cs.AI· cs.CV

Not All Knowledge Is Created Equal: Mutual Distillation of Confident Knowledge

classification 💻 cs.LG cs.AIcs.CV
keywords knowledgemodeldistillationreliableselectionunderlinecmd-pexample
0
0 comments X
read the original abstract

Mutual knowledge distillation (MKD) improves a model by distilling knowledge from another model. However, \textit{not all knowledge is certain and correct}, especially under adverse conditions. For example, label noise usually leads to less reliable models due to undesired memorization \cite{zhang2017understanding,arpit2017closer}. Wrong knowledge misleads the learning rather than helps. This problem can be handled by two aspects: (i) improving the reliability of a model where the knowledge is from (i.e., knowledge source's reliability); (ii) selecting reliable knowledge for distillation. In the literature, making a model more reliable is widely studied while selective MKD receives little attention. Therefore, we focus on studying selective MKD. Concretely, a generic MKD framework, \underline{C}onfident knowledge selection followed by \underline{M}utual \underline{D}istillation (CMD), is designed. The key component of CMD is a generic knowledge selection formulation, making the selection threshold either static (CMD-S) or progressive (CMD-P). Additionally, CMD covers two special cases: zero-knowledge and all knowledge, leading to a unified MKD framework. Extensive experiments are present to demonstrate the effectiveness of CMD and thoroughly justify the design of CMD. For example, CMD-P obtains new state-of-the-art results in robustness against label noise.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. FLAME: Condensing Ensemble Diversity into a Single Network for Efficient Sequential Recommendation

    cs.IR 2026-04 conditional novelty 6.0

    FLAME condenses ensemble diversity into a single network via modular ensemble simulation and guided mutual learning during training, delivering ensemble-level performance with single-network inference speed on sequent...