Recognition: unknown
Unifying distillation and privileged information
read the original abstract
Distillation (Hinton et al., 2015) and privileged information (Vapnik & Izmailov, 2015) are two techniques that enable machines to learn from other machines. This paper unifies these two techniques into generalized distillation, a framework to learn from multiple machines and data representations. We provide theoretical and causal insight about the inner workings of generalized distillation, extend it to unsupervised, semisupervised and multitask learning scenarios, and illustrate its efficacy on a variety of numerical simulations on both synthetic and real-world data.
This paper has not been read by Pith yet.
Forward citations
Cited by 4 Pith papers
-
Revisiting the Capacity Gap in Chain-of-Thought Distillation from a Practical Perspective
CoT distillation frequently degrades student performance versus pre-distillation baselines, and capacity gap effects do not consistently dominate under a realistic protocol that includes original baselines.
-
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information
Anti-Self-Distillation reverses self-distillation signals via PMI to fix overconfidence on structural tokens, matching GRPO baseline accuracy 2-10x faster with up to 11.5 point gains across 4B-30B models.
-
LiteGUI: Distilling Compact GUI Agents with Reinforcement Learning
LiteGUI trains 2B/3B-scale GUI agents via SFT-free guided on-policy distillation and multi-solution dual-level GRPO to reach SOTA lightweight performance and compete with larger models.
-
Neurosymbolic Imitation Learning with Human Guidance: A Privileged Information Approach
A neurosymbolic imitation learning approach uses privileged gaze data during training to handle high-dimensional inputs while achieving better generalization than pure neural or symbolic methods.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.