Comparing kullback-leibler divergence and mean squared error loss in knowledge distillation

Taehyeon Kim, Jaehoon Oh, NakYil Kim, Sangwook Cho, Se-Young Yun · 2021 · arXiv 2105.08919

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Learning Through Noise: Why Subliminal Learning Works and When It Fails

cs.LG · 2026-05-22 · unverdicted · novelty 7.0

Subliminal learning occurs via compatible auxiliary and class output heads on task-unrelated inputs, even with random hidden layers or architecture changes, with theory and upper bounds on failure.

PACED: Distillation and On-Policy Self-Distillation at the Frontier of Student Competence

cs.AI · 2026-03-11 · conditional · novelty 7.0

PACED applies student pass-rate weighting w(p)=p(1-p) to distillation, concentrating on the zone of proximal development and delivering up to +8.2 gains on AIME tasks with reduced forgetting.

Direct-to-Event Spiking Neural Network Transfer

cs.NE · 2026-05-08 · unverdicted · novelty 6.0

This work provides the first systematic study of transferring direct-coded spiking neural networks to event-based representations while aiming to preserve accuracy and reduce energy use.

CLIP-RD: Relative Distillation for Efficient CLIP Knowledge Distillation

cs.CV · 2026-03-26 · unverdicted · novelty 6.0

CLIP-RD adds VRD for cross-modality distillation consistency and XRD for bidirectional cross-modal symmetry to align student embedding geometry more closely with the teacher, yielding a 0.8 percentage point gain over prior distillation methods.

citing papers explorer

Showing 4 of 4 citing papers.

Learning Through Noise: Why Subliminal Learning Works and When It Fails cs.LG · 2026-05-22 · unverdicted · none · ref 20
Subliminal learning occurs via compatible auxiliary and class output heads on task-unrelated inputs, even with random hidden layers or architecture changes, with theory and upper bounds on failure.
PACED: Distillation and On-Policy Self-Distillation at the Frontier of Student Competence cs.AI · 2026-03-11 · conditional · none · ref 5
PACED applies student pass-rate weighting w(p)=p(1-p) to distillation, concentrating on the zone of proximal development and delivering up to +8.2 gains on AIME tasks with reduced forgetting.
Direct-to-Event Spiking Neural Network Transfer cs.NE · 2026-05-08 · unverdicted · none · ref 43
This work provides the first systematic study of transferring direct-coded spiking neural networks to event-based representations while aiming to preserve accuracy and reduce energy use.
CLIP-RD: Relative Distillation for Efficient CLIP Knowledge Distillation cs.CV · 2026-03-26 · unverdicted · none · ref 26
CLIP-RD adds VRD for cross-modality distillation consistency and XRD for bidirectional cross-modal symmetry to align student embedding geometry more closely with the teacher, yielding a 0.8 percentage point gain over prior distillation methods.

Comparing kullback-leibler divergence and mean squared error loss in knowledge distillation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer