pith. sign in

hub Canonical reference

Black-box on-policy distillation of large language models.arXiv preprint arXiv:2511.10643

Canonical reference. 80% of citing Pith papers cite this work as background.

11 Pith papers citing it
Background 80% of classified citations

hub tools

citation-role summary

background 4 baseline 1

citation-polarity summary

fields

cs.LG 6 cs.CL 5

years

2026 11

representative citing papers

Rubric-based On-policy Distillation

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Rubric-based on-policy distillation allows training student models using only teacher responses by generating scoring rubrics from contrasts and using them for on-policy optimization, achieving superior performance and up to 10x better sample efficiency than logit-based approaches.

citing papers explorer

Showing 11 of 11 citing papers.