GDPD treats partial student features as degraded observations and uses a learned diffusion prior over teacher features to sample restorative long-context targets for improved partial time-series classification.
Knowledge distillation based on transformed teacher matching
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
background 2polarities
background 2representative citing papers
BicKD introduces a bilateral contrastive loss in knowledge distillation that strengthens class-wise orthogonality and intra-class consistency in predictive distributions, outperforming prior logit-based methods.
LiteMedCoT-VL distills chain-of-thought from a 235B model to 2B VLMs via LoRA, reaching 64.9% accuracy on PMC-VQA and beating a 4B zero-shot baseline by 11 points.
EGAD adaptively distills LLM knowledge at the token level by using entropy to create a curriculum from low- to high-entropy tokens, adjust temperature, and switch between logits-only and feature-based branches.
citing papers explorer
-
Generative Diffusion Prior Distillation for Long-Context Knowledge Transfer
GDPD treats partial student features as degraded observations and uses a learned diffusion prior over teacher features to sample restorative long-context targets for improved partial time-series classification.
-
BicKD: Bilateral Contrastive Knowledge Distillation
BicKD introduces a bilateral contrastive loss in knowledge distillation that strengthens class-wise orthogonality and intra-class consistency in predictive distributions, outperforming prior logit-based methods.
-
LiteMedCoT-VL: Parameter-Efficient Adaptation for Medical Visual Question Answering
LiteMedCoT-VL distills chain-of-thought from a 235B model to 2B VLMs via LoRA, reaching 64.9% accuracy on PMC-VQA and beating a 4B zero-shot baseline by 11 points.
-
EGAD: Entropy-Guided Adaptive Distillation for Token-Level Knowledge Transfer
EGAD adaptively distills LLM knowledge at the token level by using entropy to create a curriculum from low- to high-entropy tokens, adjust temperature, and switch between logits-only and feature-based branches.