Derives upper and lower generalization bounds for the student relative to the teacher using a new distillation divergence, plus a loss-sharpness-aware bound and a bias-variance-rank decomposition in the linear Gaussian case.
Generalization bounds via distillation,
2 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 2representative citing papers
Neural regression collapse occurs when last-layer feature intrinsic dimension falls below target intrinsic dimension, creating over-compressed and under-compressed regimes that govern generalization based on data quantity and noise.
citing papers explorer
-
On the Generalization of Knowledge Distillation: An Information-Theoretic View
Derives upper and lower generalization bounds for the student relative to the teacher using a new distillation divergence, plus a loss-sharpness-aware bound and a bias-variance-rank decomposition in the linear Gaussian case.
-
Geometric Analysis of Neural Regression Collapse via Intrinsic Dimension
Neural regression collapse occurs when last-layer feature intrinsic dimension falls below target intrinsic dimension, creating over-compressed and under-compressed regimes that govern generalization based on data quantity and noise.