pith:VPZTCO4J
Teacher-Guided Policy Optimization for LLM Distillation
Teacher-Guided Policy Optimization fixes uninformative feedback in reverse KL by conditioning teacher predictions on student rollouts.
arxiv:2605.13230 v1 · 2026-05-13 · cs.LG · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{VPZTCO4J4EG4HSUEEIFR2J7HZT}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
TGPO significantly outperforms standard baselines and is robust to different teachers on complex reasoning benchmarks.
That conditioning teacher predictions on the student's rollout will reliably produce informative directional guidance even when student and teacher distributions diverge substantially.
TGPO improves on-policy LLM distillation by using teacher predictions conditioned on student rollouts to supply informative guidance when the two distributions diverge.
References
Formal links
Receipt and verification
| First computed | 2026-05-18T02:44:49.594387Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
abf3313b89e10dc3ca84220b1d27e7ccd7906413524cf5e18781579e1ba986ce
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/VPZTCO4J4EG4HSUEEIFR2J7HZT \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: abf3313b89e10dc3ca84220b1d27e7ccd7906413524cf5e18781579e1ba986ce
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "2c4910f7ce364db0a93c72122ee6e645a59abedf6cfd9fab5f920fd384541c66",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-05-13T09:20:03Z",
"title_canon_sha256": "0e90ea93b8cdaae16215c07b5d8dfcd34c3814bd0bf4212ce3a2310bad428029"
},
"schema_version": "1.0",
"source": {
"id": "2605.13230",
"kind": "arxiv",
"version": 1
}
}