{"paper":{"title":"A Survey on Knowledge Distillation of Large Language Models","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Knowledge distillation transfers advanced capabilities from proprietary LLMs like GPT-4 to open-source models such as LLaMA and Mistral.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Can Xu, Chongyang Tao, Dacheng Tao, Jinyang Li, Ming Li, Reynold Cheng, Tao Shen, Tianyi Zhou, Xiaohan Xu","submitted_at":"2024-02-20T16:17:37Z","abstract_excerpt":"In the era of Large Language Models (LLMs), Knowledge Distillation (KD) emerges as a pivotal methodology for transferring advanced capabilities from leading proprietary LLMs, such as GPT-4, to their open-source counterparts like LLaMA and Mistral. Additionally, as open-source LLMs flourish, KD plays a crucial role in both compressing these models, and facilitating their self-improvement by employing themselves as teachers. This paper presents a comprehensive survey of KD's role within the realm of LLM, highlighting its critical function in imparting advanced knowledge to smaller models and its"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"KD emerges as a pivotal methodology for transferring advanced capabilities from leading proprietary LLMs, such as GPT-4, to their open-source counterparts like LLaMA and Mistral, while also enabling model compression and self-improvement.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That data augmentation within the KD framework can reliably enable open-source models to approximate the contextual adeptness, ethical alignment, and deep semantic insights of proprietary models.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"A comprehensive survey of knowledge distillation for LLMs structured around algorithms, skill enhancement, and vertical applications, highlighting data augmentation as a key enabler.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Knowledge distillation transfers advanced capabilities from proprietary LLMs like GPT-4 to open-source models such as LLaMA and Mistral.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"e3b88ff0303c0d4d2e95c70c9ed3150d2102ed0863b08d9400bc05dadafabdf4"},"source":{"id":"2402.13116","kind":"arxiv","version":4},"verdict":{"id":"c8c748e0-8724-479c-9fec-65c18e1c3267","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T23:27:17.705659Z","strongest_claim":"KD emerges as a pivotal methodology for transferring advanced capabilities from leading proprietary LLMs, such as GPT-4, to their open-source counterparts like LLaMA and Mistral, while also enabling model compression and self-improvement.","one_line_summary":"A comprehensive survey of knowledge distillation for LLMs structured around algorithms, skill enhancement, and vertical applications, highlighting data augmentation as a key enabler.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That data augmentation within the KD framework can reliably enable open-source models to approximate the contextual adeptness, ethical alignment, and deep semantic insights of proprietary models.","pith_extraction_headline":"Knowledge distillation transfers advanced capabilities from proprietary LLMs like GPT-4 to open-source models such as LLaMA and Mistral."},"references":{"count":300,"sample":[{"doi":"","year":null,"title":"Advances in Neural Information Processing Systems , volume=","work_id":"c2cc414b-17c6-4006-b389-f3ec0bf8141b","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"arXiv preprint arXiv:2304.14233 , year=","work_id":"6ee50a7d-801e-4444-93a9-5496cce785bd","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"arXiv preprint arXiv:2305.07402 , year=","work_id":"5ead8756-3597-48cb-afe3-a2ec06e20dbf","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"arXiv preprint arXiv:2212.10192 , year=","work_id":"e82d7266-b4ea-4926-b70c-0833cea400f0","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"The Eleventh International Conference on Learning Representations , year=","work_id":"49c26f65-5708-4074-91fa-3703224fe0a9","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":300,"snapshot_sha256":"e603a99c43e130929671350775832f5916ccf292fb130c9e311d07ebe0688d0f","internal_anchors":25},"formal_canon":{"evidence_count":3,"snapshot_sha256":"daf0e6e47c588826aa57be3413278158443d5742bee93599f2d5c63c4dc5478e"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}