{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2026:KW7XYM5B6RYRB4ECLKAOQB534H","short_pith_number":"pith:KW7XYM5B","schema_version":"1.0","canonical_sha256":"55bf7c33a1f47110f0825a80e807bbe1df4d20a11d0fda0a615ccbf8dd5f3608","source":{"kind":"arxiv","id":"2605.11739","version":3},"attestation_state":"computed","paper":{"title":"Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation","license":"http://creativecommons.org/licenses/by/4.0/","headline":"On-policy distillation locks onto a stable update trajectory toward the final model early in training.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Chunxi Luo, Ding Cao, Guangzhong Sun, Guiquan Liu, Junfeng Fang, Kai Yang, Liang Lin, Saiyong Yang, Tianxiang Zhao, Weijie Liu, Xin Xu, Yuchen Cai","submitted_at":"2026-05-12T08:19:15Z","abstract_excerpt":"On-policy distillation (OPD) has emerged as an efficient post-training paradigm for large language models. However, existing studies largely attribute this advantage to denser and more stable supervision, while the parameter-level mechanisms underlying OPD's efficiency remain poorly understood. In this work, we argue that OPD's efficiency stems from a form of ``foresight'': it establishes a stable update trajectory toward the final model early in training. This foresight manifests in two aspects. First, at the \\textbf{Module-Allocation Level}, OPD identifies regions with low marginal utility a"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":false,"formal_links_present":false},"canonical_record":{"source":{"id":"2605.11739","kind":"arxiv","version":3},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.CL","submitted_at":"2026-05-12T08:19:15Z","cross_cats_sorted":[],"title_canon_sha256":"24a4e720c67fa9bfcfb3d2ff0a86b4c0190ab5ee5944ed8d591d6b7e1b54d7b9","abstract_canon_sha256":"29be5cf5a4a7a59eae794005e1ab80404060a346f322416b0b85aff6eda66c38"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-22T01:04:06.081114Z","signature_b64":"d263Dx+FJWay/HJYnYShwqTVJ8mGAA7vmTFJy2Z3En82aNzvcb8t9GLMjqNyeIZtecizlBMcCk3a4HVbGnHgCg==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"55bf7c33a1f47110f0825a80e807bbe1df4d20a11d0fda0a615ccbf8dd5f3608","last_reissued_at":"2026-05-22T01:04:06.080374Z","signature_status":"signed_v1","first_computed_at":"2026-05-22T01:04:06.080374Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation","license":"http://creativecommons.org/licenses/by/4.0/","headline":"On-policy distillation locks onto a stable update trajectory toward the final model early in training.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Chunxi Luo, Ding Cao, Guangzhong Sun, Guiquan Liu, Junfeng Fang, Kai Yang, Liang Lin, Saiyong Yang, Tianxiang Zhao, Weijie Liu, Xin Xu, Yuchen Cai","submitted_at":"2026-05-12T08:19:15Z","abstract_excerpt":"On-policy distillation (OPD) has emerged as an efficient post-training paradigm for large language models. However, existing studies largely attribute this advantage to denser and more stable supervision, while the parameter-level mechanisms underlying OPD's efficiency remain poorly understood. In this work, we argue that OPD's efficiency stems from a form of ``foresight'': it establishes a stable update trajectory toward the final model early in training. This foresight manifests in two aspects. First, at the \\textbf{Module-Allocation Level}, OPD identifies regions with low marginal utility a"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"OPD's efficiency stems from a form of ``foresight'': it establishes a stable update trajectory toward the final model early in training. This manifests at the Module-Allocation Level by concentrating updates on critical modules and at the Update-Direction Level by stronger low-rank concentration aligning with the final subspace, enabling EffOPD to achieve an average 3x training acceleration.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the observed module utility patterns and low-rank alignment are causal drivers of efficiency rather than correlated side effects, and that adaptive extrapolation along the current direction generalizes without degrading final performance across diverse tasks and model scales.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"On-policy distillation gains efficiency from early foresight in module allocation and low-rank update directions, enabling EffOPD to accelerate training by 3x via adaptive extrapolation without extra modules or tuning.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"On-policy distillation locks onto a stable update trajectory toward the final model early in training.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"8010a9c1153b6d94cee7b7e20e7e0b95e55b5019d24ae6c1f63dd8966f4136f0"},"source":{"id":"2605.11739","kind":"arxiv","version":3},"verdict":{"id":"9733fa8f-c567-456e-bbaa-c270fa66a31e","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T21:04:08.675163Z","strongest_claim":"OPD's efficiency stems from a form of ``foresight'': it establishes a stable update trajectory toward the final model early in training. This manifests at the Module-Allocation Level by concentrating updates on critical modules and at the Update-Direction Level by stronger low-rank concentration aligning with the final subspace, enabling EffOPD to achieve an average 3x training acceleration.","one_line_summary":"On-policy distillation gains efficiency from early foresight in module allocation and low-rank update directions, enabling EffOPD to accelerate training by 3x via adaptive extrapolation without extra modules or tuning.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the observed module utility patterns and low-rank alignment are causal drivers of efficiency rather than correlated side effects, and that adaptive extrapolation along the current direction generalizes without degrading final performance across diverse tasks and model scales.","pith_extraction_headline":"On-policy distillation locks onto a stable update trajectory toward the final model early in training."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.11739/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"doi_title_agreement","ran_at":"2026-05-20T23:31:31.832844Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-20T13:35:23.523068Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"claim_evidence","ran_at":"2026-05-20T03:42:00.408476Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"ai_meta_artifact","ran_at":"2026-05-19T11:38:46.995284Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"f5301e0f2168bb5d7401fe6b4f8c5cfb08a5b3dac492ada2e6796f4be89f5e7d"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2605.11739","created_at":"2026-05-22T01:04:06.080475+00:00"},{"alias_kind":"arxiv_version","alias_value":"2605.11739v3","created_at":"2026-05-22T01:04:06.080475+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2605.11739","created_at":"2026-05-22T01:04:06.080475+00:00"},{"alias_kind":"pith_short_12","alias_value":"KW7XYM5B6RYR","created_at":"2026-05-22T01:04:06.080475+00:00"},{"alias_kind":"pith_short_16","alias_value":"KW7XYM5B6RYRB4EC","created_at":"2026-05-22T01:04:06.080475+00:00"},{"alias_kind":"pith_short_8","alias_value":"KW7XYM5B","created_at":"2026-05-22T01:04:06.080475+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":0,"internal_anchor_count":0,"sample":[]},"formal_canon":{"evidence_count":0,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/KW7XYM5B6RYRB4ECLKAOQB534H","json":"https://pith.science/pith/KW7XYM5B6RYRB4ECLKAOQB534H.json","graph_json":"https://pith.science/api/pith-number/KW7XYM5B6RYRB4ECLKAOQB534H/graph.json","events_json":"https://pith.science/api/pith-number/KW7XYM5B6RYRB4ECLKAOQB534H/events.json","paper":"https://pith.science/paper/KW7XYM5B"},"agent_actions":{"view_html":"https://pith.science/pith/KW7XYM5B6RYRB4ECLKAOQB534H","download_json":"https://pith.science/pith/KW7XYM5B6RYRB4ECLKAOQB534H.json","view_paper":"https://pith.science/paper/KW7XYM5B","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2605.11739&json=true","fetch_graph":"https://pith.science/api/pith-number/KW7XYM5B6RYRB4ECLKAOQB534H/graph.json","fetch_events":"https://pith.science/api/pith-number/KW7XYM5B6RYRB4ECLKAOQB534H/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/KW7XYM5B6RYRB4ECLKAOQB534H/action/timestamp_anchor","attest_storage":"https://pith.science/pith/KW7XYM5B6RYRB4ECLKAOQB534H/action/storage_attestation","attest_author":"https://pith.science/pith/KW7XYM5B6RYRB4ECLKAOQB534H/action/author_attestation","sign_citation":"https://pith.science/pith/KW7XYM5B6RYRB4ECLKAOQB534H/action/citation_signature","submit_replication":"https://pith.science/pith/KW7XYM5B6RYRB4ECLKAOQB534H/action/replication_record"}},"created_at":"2026-05-22T01:04:06.080475+00:00","updated_at":"2026-05-22T01:04:06.080475+00:00"}