pith:43MK434F
Multi-Rollout On-Policy Distillation via Peer Successes and Failures
By conditioning teacher signals on both successful and failed peer rollouts from the same prompt, multi-rollout on-policy distillation supplies denser and better-aligned supervision than single-rollout baselines.
arxiv:2605.12652 v1 · 2026-05-12 · cs.LG · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{43MK434FGV2PB6DWQ25Y4V6ULU}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Experiments on competitive programming, mathematical reasoning, scientific question answering, and tool-use benchmarks show that MOPD consistently improves over standard on-policy baselines. Further teacher-signal analysis shows that mixed success-failure contexts better align teacher scores with verifier rewards.
That the student's local rollout group can be used to construct teacher signals that are both more informative and better aligned with external verifier rewards without introducing new biases from the peer selection process.
MOPD improves on-policy distillation for LLMs by using peer successes for positive patterns and failures for negative examples to create more informative teacher signals.
References
Formal links
Receipt and verification
| First computed | 2026-05-18T03:09:50.759655Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
e6d8ae6f853574f0f87686bb8e57d45d14eaa6d5928503e5034afa5da6273372
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/43MK434FGV2PB6DWQ25Y4V6ULU \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: e6d8ae6f853574f0f87686bb8e57d45d14eaa6d5928503e5034afa5da6273372
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "593fa5067f258c404222fd96a88c5f1b645eb03f85ac6ce256a6dbdd8e7b3fcc",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-05-12T18:57:44Z",
"title_canon_sha256": "e19006a81b64cfbc13bb3059452dcfc0320e92d0a44498802fc5f562d060bf42"
},
"schema_version": "1.0",
"source": {
"id": "2605.12652",
"kind": "arxiv",
"version": 1
}
}