pith:U3E6Y3MT
On the Generalization of Knowledge Distillation: An Information-Theoretic View
Knowledge distillation generalization is bounded by the KL divergence between teacher and student training kernels.
arxiv:2605.13143 v1 · 2026-05-13 · cs.IT · cs.LG · math.IT
Record completeness
Claims
We derive two generalization bounds for the student model relative to the teacher's generalization gap: an upper bound under a sub-Gaussian assumption via algorithmic stability, and a lower bound under a central condition with sharper dependence on the distillation divergence. We further develop a loss-sharpness-aware bound with an explicit tightness regime, showing that the teacher's local flatness can strictly tighten the bound.
The modeling of teacher and student training as coupled stochastic processes whose kernels admit a well-defined KL divergence; the sub-Gaussian assumption required for the stability upper bound; and the central condition required for the lower bound.
Knowledge distillation generalization bounds are derived via a new distillation divergence measuring teacher-student kernel difference, with tighter bounds from teacher loss flatness.
References
Receipt and verification
| First computed | 2026-05-18T03:08:57.367582Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
a6c9ec6d9361451911eb6aed49d47b3c5ef3f6c0df5b1a0711cff4018c1b1596
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/U3E6Y3MTMFCRSEPLNLWUTVD3HR \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: a6c9ec6d9361451911eb6aed49d47b3c5ef3f6c0df5b1a0711cff4018c1b1596
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "ec92b10b055b4c51f8bfeb8b5d713232228d3308586cc7838a46d764aef1b4e0",
"cross_cats_sorted": [
"cs.LG",
"math.IT"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.IT",
"submitted_at": "2026-05-13T08:10:05Z",
"title_canon_sha256": "9e2d525555f8b45ca44c8252dce441cbd5e0ead050b8a393bfa1ec933bbe24fd"
},
"schema_version": "1.0",
"source": {
"id": "2605.13143",
"kind": "arxiv",
"version": 1
}
}