Knowledge distillation based on transformed teacher matching

· 2024 · arXiv 2402.11148

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Generative Diffusion Prior Distillation for Long-Context Knowledge Transfer

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

GDPD treats partial student features as degraded observations and uses a learned diffusion prior over teacher features to sample restorative long-context targets for improved partial time-series classification.

BicKD: Bilateral Contrastive Knowledge Distillation

cs.LG · 2026-02-01 · unverdicted · novelty 6.0

BicKD introduces a bilateral contrastive loss in knowledge distillation that strengthens class-wise orthogonality and intra-class consistency in predictive distributions, outperforming prior logit-based methods.

LiteMedCoT-VL: Parameter-Efficient Adaptation for Medical Visual Question Answering

cs.CV · 2026-05-10 · unverdicted · novelty 5.0

LiteMedCoT-VL distills chain-of-thought from a 235B model to 2B VLMs via LoRA, reaching 64.9% accuracy on PMC-VQA and beating a 4B zero-shot baseline by 11 points.

EGAD: Entropy-Guided Adaptive Distillation for Token-Level Knowledge Transfer

cs.CL · 2026-05-03 · unverdicted · novelty 5.0

EGAD adaptively distills LLM knowledge at the token level by using entropy to create a curriculum from low- to high-entropy tokens, adjust temperature, and switch between logits-only and feature-based branches.

citing papers explorer

Showing 4 of 4 citing papers.

Generative Diffusion Prior Distillation for Long-Context Knowledge Transfer cs.LG · 2026-05-12 · unverdicted · none · ref 14
GDPD treats partial student features as degraded observations and uses a learned diffusion prior over teacher features to sample restorative long-context targets for improved partial time-series classification.
BicKD: Bilateral Contrastive Knowledge Distillation cs.LG · 2026-02-01 · unverdicted · none · ref 19
BicKD introduces a bilateral contrastive loss in knowledge distillation that strengthens class-wise orthogonality and intra-class consistency in predictive distributions, outperforming prior logit-based methods.
LiteMedCoT-VL: Parameter-Efficient Adaptation for Medical Visual Question Answering cs.CV · 2026-05-10 · unverdicted · none · ref 35
LiteMedCoT-VL distills chain-of-thought from a 235B model to 2B VLMs via LoRA, reaching 64.9% accuracy on PMC-VQA and beating a 4B zero-shot baseline by 11 points.
EGAD: Entropy-Guided Adaptive Distillation for Token-Level Knowledge Transfer cs.CL · 2026-05-03 · unverdicted · none · ref 49
EGAD adaptively distills LLM knowledge at the token level by using entropy to create a curriculum from low- to high-entropy tokens, adjust temperature, and switch between logits-only and feature-based branches.

Knowledge distillation based on transformed teacher matching

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer