{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2026:WFIONFVJODUMKVU2PDA2TKUBJ3","short_pith_number":"pith:WFIONFVJ","schema_version":"1.0","canonical_sha256":"b150e696a970e8c5569a78c1a9aa814efedbc66c4b0456d998051e730b9a22f7","source":{"kind":"arxiv","id":"2603.24472","version":3},"attestation_state":"computed","paper":{"title":"Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Self-distillation suppresses uncertainty expression in LLMs, degrading performance on out-of-domain reasoning tasks.","cross_cats":["cs.LG"],"primary_cat":"cs.CL","authors_text":"Dohyung Kim, Dongsheng Li, Jeonghye Kim, Jiwon Jeon, Minbeom Kim, Sangmook Lee, Xufang Luo, Yuqing Yang","submitted_at":"2026-03-25T16:14:52Z","abstract_excerpt":"Self-distillation has emerged as an effective post-training paradigm for LLMs, often improving performance while shortening reasoning traces. However, in mathematical reasoning, we find that it can reduce response length while degrading performance. We trace this degradation to the suppression of epistemic verbalization - the model's expression of uncertainty during reasoning. Through controlled experiments varying conditioning context richness and task coverage, we show that conditioning the teacher on rich information suppresses uncertainty expression, enabling rapid in-domain optimization w"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":false,"formal_links_present":true},"canonical_record":{"source":{"id":"2603.24472","kind":"arxiv","version":3},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.CL","submitted_at":"2026-03-25T16:14:52Z","cross_cats_sorted":["cs.LG"],"title_canon_sha256":"03137f3646289404422d4a6c18eca0b1dbcb2452d5e6ab0bbf88e34684653b9c","abstract_canon_sha256":"f95f0ba5cc03b4fe84a242dc9bb2fd6e21347a4d85e658a8d648c74dac6e2efa"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-21T02:05:01.722102Z","signature_b64":"mJGlmi7rzqhiDhnXcbQpEEBVkc5ZRkLHadfFutN18JLPT0qJhyAF3s87NPGnhinDt9Gt1YHa8gwedwbSjAUqCA==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"b150e696a970e8c5569a78c1a9aa814efedbc66c4b0456d998051e730b9a22f7","last_reissued_at":"2026-05-21T02:05:01.721388Z","signature_status":"signed_v1","first_computed_at":"2026-05-21T02:05:01.721388Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Self-distillation suppresses uncertainty expression in LLMs, degrading performance on out-of-domain reasoning tasks.","cross_cats":["cs.LG"],"primary_cat":"cs.CL","authors_text":"Dohyung Kim, Dongsheng Li, Jeonghye Kim, Jiwon Jeon, Minbeom Kim, Sangmook Lee, Xufang Luo, Yuqing Yang","submitted_at":"2026-03-25T16:14:52Z","abstract_excerpt":"Self-distillation has emerged as an effective post-training paradigm for LLMs, often improving performance while shortening reasoning traces. However, in mathematical reasoning, we find that it can reduce response length while degrading performance. We trace this degradation to the suppression of epistemic verbalization - the model's expression of uncertainty during reasoning. Through controlled experiments varying conditioning context richness and task coverage, we show that conditioning the teacher on rich information suppresses uncertainty expression, enabling rapid in-domain optimization w"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"conditioning the teacher on rich information suppresses uncertainty expression, enabling rapid in-domain optimization with limited task coverage but harming OOD performance, where unseen problems benefit from expressing uncertainty and adjusting accordingly.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The assumption that the observed OOD degradation is caused primarily by suppression of epistemic verbalization rather than other unmeasured changes from self-distillation or model-specific factors.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Self-distillation suppresses epistemic verbalization in LLMs, causing up to 40% drops in out-of-domain mathematical reasoning despite in-domain gains.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Self-distillation suppresses uncertainty expression in LLMs, degrading performance on out-of-domain reasoning tasks.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"5d58d0e430f3698abd0d40c2fc13c88f7d4053b6fe64a745c410d077c0a10fb3"},"source":{"id":"2603.24472","kind":"arxiv","version":3},"verdict":{"id":"c31d0759-01f4-477d-bc98-b47939c0503c","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T00:39:27.721013Z","strongest_claim":"conditioning the teacher on rich information suppresses uncertainty expression, enabling rapid in-domain optimization with limited task coverage but harming OOD performance, where unseen problems benefit from expressing uncertainty and adjusting accordingly.","one_line_summary":"Self-distillation suppresses epistemic verbalization in LLMs, causing up to 40% drops in out-of-domain mathematical reasoning despite in-domain gains.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The assumption that the observed OOD degradation is caused primarily by suppression of epistemic verbalization rather than other unmeasured changes from self-distillation or model-specific factors.","pith_extraction_headline":"Self-distillation suppresses uncertainty expression in LLMs, degrading performance on out-of-domain reasoning tasks."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2603.24472/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"b59edff3944e09bc325517cd6cfe3fc041d47eaaf29e35530f051d9abe8b0457"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2603.24472","created_at":"2026-05-21T02:05:01.721487+00:00"},{"alias_kind":"arxiv_version","alias_value":"2603.24472v3","created_at":"2026-05-21T02:05:01.721487+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2603.24472","created_at":"2026-05-21T02:05:01.721487+00:00"},{"alias_kind":"pith_short_12","alias_value":"WFIONFVJODUM","created_at":"2026-05-21T02:05:01.721487+00:00"},{"alias_kind":"pith_short_16","alias_value":"WFIONFVJODUMKVU2","created_at":"2026-05-21T02:05:01.721487+00:00"},{"alias_kind":"pith_short_8","alias_value":"WFIONFVJ","created_at":"2026-05-21T02:05:01.721487+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":25,"internal_anchor_count":25,"sample":[{"citing_arxiv_id":"2605.16865","citing_title":"MixSD: Mixed Contextual Self-Distillation for Knowledge Injection","ref_index":11,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18141","citing_title":"A Brief Overview: On-Policy Self-Distillation In Large Language Models","ref_index":23,"is_internal_anchor":true},{"citing_arxiv_id":"2605.22263","citing_title":"Tailoring Teaching to Aptitude: Direction-Adaptive Self-Distillation for LLM Reasoning","ref_index":21,"is_internal_anchor":true},{"citing_arxiv_id":"2605.15113","citing_title":"Learning from Language Feedback via Variational Policy Distillation","ref_index":14,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18141","citing_title":"A Brief Overview: On-Policy Self-Distillation In Large Language Models","ref_index":23,"is_internal_anchor":true},{"citing_arxiv_id":"2605.16865","citing_title":"MixSD: Mixed Contextual Self-Distillation for Knowledge Injection","ref_index":11,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14539","citing_title":"Learning from Failures: Correction-Oriented Policy Optimization with Verifiable Rewards","ref_index":13,"is_internal_anchor":true},{"citing_arxiv_id":"2604.14164","citing_title":"How to Fine-Tune a Reasoning Model? A Teacher-Student Cooperation Framework to Synthesize Student-Consistent SFT Data","ref_index":26,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12652","citing_title":"Multi-Rollout On-Policy Distillation via Peer Successes and Failures","ref_index":39,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12741","citing_title":"Learning with Rare Success but Rich Feedback via Reflection-Enhanced Self-Distillation","ref_index":7,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11505","citing_title":"Selective Off-Policy Reference Tuning with Plan Guidance","ref_index":49,"is_internal_anchor":true},{"citing_arxiv_id":"2605.09725","citing_title":"On-Policy Distillation with Best-of-N Teacher Rollout Selection","ref_index":22,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13255","citing_title":"Respecting Self-Uncertainty in On-Policy Self-Distillation for Efficient LLM Reasoning","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11505","citing_title":"Selective Off-Policy Reference Tuning with Plan Guidance","ref_index":49,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11609","citing_title":"Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2605.09725","citing_title":"On-Policy Distillation with Best-of-N Teacher Rollout Selection","ref_index":22,"is_internal_anchor":true},{"citing_arxiv_id":"2605.10781","citing_title":"Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR","ref_index":11,"is_internal_anchor":true},{"citing_arxiv_id":"2605.10194","citing_title":"TRACE: Distilling Where It Matters via Token-Routed Self On-Policy Alignment","ref_index":7,"is_internal_anchor":true},{"citing_arxiv_id":"2605.05040","citing_title":"Preference-Based Self-Distillation: Beyond KL Matching via Reward Regularization","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2604.13016","citing_title":"Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe","ref_index":13,"is_internal_anchor":true},{"citing_arxiv_id":"2604.08527","citing_title":"Demystifying OPD: Length Inflation and Stabilization Strategies for Large Language Models","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2605.02971","citing_title":"Multilingual Safety Alignment via Self-Distillation","ref_index":14,"is_internal_anchor":true},{"citing_arxiv_id":"2605.07865","citing_title":"KL for a KL: On-Policy Distillation with Control Variate Baseline","ref_index":16,"is_internal_anchor":true},{"citing_arxiv_id":"2604.16158","citing_title":"AtManRL: Towards Faithful Reasoning via Differentiable Attention Saliency","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2605.02971","citing_title":"Multilingual Safety Alignment via Self-Distillation","ref_index":14,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/WFIONFVJODUMKVU2PDA2TKUBJ3","json":"https://pith.science/pith/WFIONFVJODUMKVU2PDA2TKUBJ3.json","graph_json":"https://pith.science/api/pith-number/WFIONFVJODUMKVU2PDA2TKUBJ3/graph.json","events_json":"https://pith.science/api/pith-number/WFIONFVJODUMKVU2PDA2TKUBJ3/events.json","paper":"https://pith.science/paper/WFIONFVJ"},"agent_actions":{"view_html":"https://pith.science/pith/WFIONFVJODUMKVU2PDA2TKUBJ3","download_json":"https://pith.science/pith/WFIONFVJODUMKVU2PDA2TKUBJ3.json","view_paper":"https://pith.science/paper/WFIONFVJ","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2603.24472&json=true","fetch_graph":"https://pith.science/api/pith-number/WFIONFVJODUMKVU2PDA2TKUBJ3/graph.json","fetch_events":"https://pith.science/api/pith-number/WFIONFVJODUMKVU2PDA2TKUBJ3/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/WFIONFVJODUMKVU2PDA2TKUBJ3/action/timestamp_anchor","attest_storage":"https://pith.science/pith/WFIONFVJODUMKVU2PDA2TKUBJ3/action/storage_attestation","attest_author":"https://pith.science/pith/WFIONFVJODUMKVU2PDA2TKUBJ3/action/author_attestation","sign_citation":"https://pith.science/pith/WFIONFVJODUMKVU2PDA2TKUBJ3/action/citation_signature","submit_replication":"https://pith.science/pith/WFIONFVJODUMKVU2PDA2TKUBJ3/action/replication_record"}},"created_at":"2026-05-21T02:05:01.721487+00:00","updated_at":"2026-05-21T02:05:01.721487+00:00"}