Unifying distillation and privileged information

David Lopez-Paz, Léon Bottou, Bernhard Schölkopf, Vladimir Vapnik · 2015 · stat.ML · arXiv 1511.03643

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

open full Pith review browse 10 citing papers arXiv PDF

abstract

Distillation (Hinton et al., 2015) and privileged information (Vapnik & Izmailov, 2015) are two techniques that enable machines to learn from other machines. This paper unifies these two techniques into generalized distillation, a framework to learn from multiple machines and data representations. We provide theoretical and causal insight about the inner workings of generalized distillation, extend it to unsupervised, semisupervised and multitask learning scenarios, and illustrate its efficacy on a variety of numerical simulations on both synthetic and real-world data.

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

RPM-Distill: Physiology-guided Adaptive Cross-modal Distillation for Robust Remote Physiological Measurement

cs.CV · 2026-06-26 · unverdicted · novelty 7.0

RPM-Distill uses synchronized radar only at training time to distill spectral periodic features into a video model via adaptive per-sample gating, yielding 81% lower MAE on remote physiological measurement tasks.

On the Generalization of Knowledge Distillation: An Information-Theoretic View

cs.IT · 2026-05-13 · unverdicted · novelty 7.0

Derives upper and lower generalization bounds for the student relative to the teacher using a new distillation divergence, plus a loss-sharpness-aware bound and a bias-variance-rank decomposition in the linear Gaussian case.

Revisiting the Capacity Gap in Chain-of-Thought Distillation from a Practical Perspective

cs.LG · 2026-04-10 · unverdicted · novelty 7.0

CoT distillation frequently degrades student performance versus pre-distillation baselines, and capacity gap effects do not consistently dominate under a realistic protocol that includes original baselines.

ResAware: Cross-Environment Website Fingerprinting via Resource-Privileged Distillation

cs.LG · 2026-06-16 · unverdicted · novelty 6.0

ResAware improves cross-environment website fingerprinting robustness by distilling resource-privileged knowledge into a traffic-only student model, raising Var-CNN F1 from 72.77% to 81.49% under 150-day drift on a 160k-sample dataset.

Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning

cs.AI · 2026-05-21 · unverdicted · novelty 6.0 · 2 refs

Search-E1 uses GRPO interleaved with on-policy self-distillation to reach 0.440 average EM on seven QA benchmarks with Qwen2.5-3B, outperforming open-source baselines.

SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning

cs.AI · 2026-05-18 · unverdicted · novelty 6.0

SD-Search derives step-level supervision for search queries in reasoning agents via on-policy hindsight self-distillation using the policy as both student and teacher.

PEPR: Privileged Event-based Predictive Regularization for Domain Generalization

cs.CV · 2026-02-04 · unverdicted · novelty 6.0

PEPR reframes learning with privileged event data as predicting latent event features from RGB to improve domain generalization in object detection and segmentation without direct cross-modal alignment.

Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Anti-Self-Distillation reverses self-distillation signals via PMI to fix overconfidence on structural tokens, matching GRPO baseline accuracy 2-10x faster with up to 11.5 point gains across 4B-30B models.

LiteGUI: Distilling Compact GUI Agents with Reinforcement Learning

cs.AI · 2026-05-08 · unverdicted · novelty 6.0

LiteGUI trains 2B/3B-scale GUI agents via SFT-free guided on-policy distillation and multi-solution dual-level GRPO to reach SOTA lightweight performance and compete with larger models.

Neurosymbolic Imitation Learning with Human Guidance: A Privileged Information Approach

cs.LG · 2026-05-08 · unverdicted · novelty 5.0

A neurosymbolic imitation learning approach uses privileged gaze data during training to handle high-dimensional inputs while achieving better generalization than pure neural or symbolic methods.

citing papers explorer

Showing 10 of 10 citing papers.

RPM-Distill: Physiology-guided Adaptive Cross-modal Distillation for Robust Remote Physiological Measurement cs.CV · 2026-06-26 · unverdicted · none · ref 32 · internal anchor
RPM-Distill uses synchronized radar only at training time to distill spectral periodic features into a video model via adaptive per-sample gating, yielding 81% lower MAE on remote physiological measurement tasks.
On the Generalization of Knowledge Distillation: An Information-Theoretic View cs.IT · 2026-05-13 · unverdicted · none · ref 6 · internal anchor
Derives upper and lower generalization bounds for the student relative to the teacher using a new distillation divergence, plus a loss-sharpness-aware bound and a bias-variance-rank decomposition in the linear Gaussian case.
Revisiting the Capacity Gap in Chain-of-Thought Distillation from a Practical Perspective cs.LG · 2026-04-10 · unverdicted · none · ref 3
CoT distillation frequently degrades student performance versus pre-distillation baselines, and capacity gap effects do not consistently dominate under a realistic protocol that includes original baselines.
ResAware: Cross-Environment Website Fingerprinting via Resource-Privileged Distillation cs.LG · 2026-06-16 · unverdicted · none · ref 25 · internal anchor
ResAware improves cross-environment website fingerprinting robustness by distilling resource-privileged knowledge into a traffic-only student model, raising Var-CNN F1 from 72.77% to 81.49% under 150-day drift on a 160k-sample dataset.
Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning cs.AI · 2026-05-21 · unverdicted · none · ref 8 · 2 links · internal anchor
Search-E1 uses GRPO interleaved with on-policy self-distillation to reach 0.440 average EM on seven QA benchmarks with Qwen2.5-3B, outperforming open-source baselines.
SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning cs.AI · 2026-05-18 · unverdicted · none · ref 16 · internal anchor
SD-Search derives step-level supervision for search queries in reasoning agents via on-policy hindsight self-distillation using the policy as both student and teacher.
PEPR: Privileged Event-based Predictive Regularization for Domain Generalization cs.CV · 2026-02-04 · unverdicted · none · ref 34 · internal anchor
PEPR reframes learning with privileged event data as predicting latent event features from RGB to improve domain generalization in object detection and segmentation without direct cross-modal alignment.
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information cs.LG · 2026-05-12 · unverdicted · none · ref 16
Anti-Self-Distillation reverses self-distillation signals via PMI to fix overconfidence on structural tokens, matching GRPO baseline accuracy 2-10x faster with up to 11.5 point gains across 4B-30B models.
LiteGUI: Distilling Compact GUI Agents with Reinforcement Learning cs.AI · 2026-05-08 · unverdicted · none · ref 41
LiteGUI trains 2B/3B-scale GUI agents via SFT-free guided on-policy distillation and multi-solution dual-level GRPO to reach SOTA lightweight performance and compete with larger models.
Neurosymbolic Imitation Learning with Human Guidance: A Privileged Information Approach cs.LG · 2026-05-08 · unverdicted · none · ref 24
A neurosymbolic imitation learning approach uses privileged gaze data during training to handle high-dimensional inputs while achieving better generalization than pure neural or symbolic methods.

Unifying distillation and privileged information

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer