Pith Number

pith:6CAYQMLC

pith:2026:6CAYQMLC2AWU7XRFPLFHJHAFOB

not attested not anchored not stored refs resolved

Prefix Teach, Suffix Fade: Local Teachability Collapse in Strong-to-Weak On-Policy Distillation

Bing Wang, Jieping Ye, Kaiyuan Liu, Rongxiang Weng, Yang Bai, Ziyuan Zhuang

In strong-to-weak on-policy distillation, truncating supervision at the onset of local teachability collapse outperforms full-trajectory training.

arxiv:2605.13643 v1 · 2026-05-13 · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{6CAYQMLC2AWU7XRFPLFHJHAFOB}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

supervision should concentrate on trajectory regions where the teacher's feedback remains discriminative, rather than uniformly covering the entire response. We operationalize this principle through a trajectory-specific release rule... Experimental results... indicate that this release rule consistently outperforms standard full-trajectory OPD across five in-domain benchmarks.

C2weakest assumption

That the BIC-style downward change point on NLTK-sentence-aggregated teacher margins over the student's top-K set reliably identifies the onset of local teachability collapse without prematurely cutting useful supervision or retaining non-discriminative tokens.

C3one line summary

Local teachability collapse in trajectory suffixes makes uniform dense supervision suboptimal in strong-to-weak OPD; truncating at BIC-style change points on teacher margin improves performance.

References

51 extracted · 51 resolved · 18 Pith anchors

[1] On-policy distillation of language models: Learning from self-generated mistakes 2024

[2] Program Synthesis with Large Language Models 2021 · arXiv:2108.07732

[3] Online difficulty filtering for reasoning oriented reinforcement learning 2026

[4] MathArena: Evaluating LLMs on Uncontaminated Math Competitions 2025 · arXiv:2505.23281

[5] Steven Bird, Ewan Klein, and Edward Loper.Natural language processing with Python: analyzing text with the natural language toolkit. " O’Reilly Media, Inc.", 2009 2009

Receipt and verification

First computed	2026-05-18T02:44:17.571911Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

f081883162d02d4fde257aca749c05706f026011bdbc4f358c0417fa8f5b7cd7

Aliases

arxiv: 2605.13643 · arxiv_version: 2605.13643v1 · doi: 10.48550/arxiv.2605.13643 · pith_short_12: 6CAYQMLC2AWU · pith_short_16: 6CAYQMLC2AWU7XRF · pith_short_8: 6CAYQMLC

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/6CAYQMLC2AWU7XRFPLFHJHAFOB \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: f081883162d02d4fde257aca749c05706f026011bdbc4f358c0417fa8f5b7cd7

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "e47065b2b3899f9d1ef26913dd54940e0ce0168bd5ddce746b04c5375f12abf1",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-05-13T15:05:30Z",
    "title_canon_sha256": "d6b1a3571dcfc254d2acd344a58689ed6aa89569657105e1ce4e6d52fdca7abc"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.13643",
    "kind": "arxiv",
    "version": 1
  }
}