pith. machine review for the scientific record. sign in
Pith Number

pith:U3E6Y3MT

pith:2026:U3E6Y3MTMFCRSEPLNLWUTVD3HR
not attested not anchored not stored refs resolved

On the Generalization of Knowledge Distillation: An Information-Theoretic View

Bingying Li, Haiyun He

Knowledge distillation generalization is bounded by the KL divergence between teacher and student training kernels.

arxiv:2605.13143 v1 · 2026-05-13 · cs.IT · cs.LG · math.IT

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We derive two generalization bounds for the student model relative to the teacher's generalization gap: an upper bound under a sub-Gaussian assumption via algorithmic stability, and a lower bound under a central condition with sharper dependence on the distillation divergence. We further develop a loss-sharpness-aware bound with an explicit tightness regime, showing that the teacher's local flatness can strictly tighten the bound.

C2weakest assumption

The modeling of teacher and student training as coupled stochastic processes whose kernels admit a well-defined KL divergence; the sub-Gaussian assumption required for the stability upper bound; and the central condition required for the lower bound.

C3one line summary

Knowledge distillation generalization bounds are derived via a new distillation divergence measuring teacher-student kernel difference, with tighter bounds from teacher loss flatness.

References

27 extracted · 27 resolved · 2 Pith anchors

[1] Distilling the Knowledge in a Neural Network 2015 · arXiv:1503.02531
[2] Y . Zhang, T. Xiang, T. M. Hospedales, and H. Lu, “Deep mutual learning,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018 2018
[3] In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2017 · doi:10.1145/3097983
[4] Towards understanding knowledge distillation, 2019
[5] Do deep nets really need to be deep? 2014
Receipt and verification
First computed 2026-05-18T03:08:57.367582Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

a6c9ec6d9361451911eb6aed49d47b3c5ef3f6c0df5b1a0711cff4018c1b1596

Aliases

arxiv: 2605.13143 · arxiv_version: 2605.13143v1 · doi: 10.48550/arxiv.2605.13143 · pith_short_12: U3E6Y3MTMFCR · pith_short_16: U3E6Y3MTMFCRSEPL · pith_short_8: U3E6Y3MT
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/U3E6Y3MTMFCRSEPLNLWUTVD3HR \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: a6c9ec6d9361451911eb6aed49d47b3c5ef3f6c0df5b1a0711cff4018c1b1596
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "ec92b10b055b4c51f8bfeb8b5d713232228d3308586cc7838a46d764aef1b4e0",
    "cross_cats_sorted": [
      "cs.LG",
      "math.IT"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.IT",
    "submitted_at": "2026-05-13T08:10:05Z",
    "title_canon_sha256": "9e2d525555f8b45ca44c8252dce441cbd5e0ead050b8a393bfa1ec933bbe24fd"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.13143",
    "kind": "arxiv",
    "version": 1
  }
}