This includes identifying the target quantity, presenting the answer explicitly, and meeting format requirements

Task Completion Whether the response completes the task, produces the required final answer in the correct form

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Rubric-based on-policy distillation allows training student models using only teacher responses by generating scoring rubrics from contrasts and using them for on-policy optimization, achieving superior performance and up to 10x better sample efficiency than logit-based approaches.

citing papers explorer

Showing 1 of 1 citing paper.

Rubric-based On-policy Distillation cs.LG · 2026-05-08 · unverdicted · none · ref 46
Rubric-based on-policy distillation allows training student models using only teacher responses by generating scoring rubrics from contrasts and using them for on-policy optimization, achieving superior performance and up to 10x better sample efficiency than logit-based approaches.

This includes identifying the target quantity, presenting the answer explicitly, and meeting format requirements

fields

years

verdicts

representative citing papers

citing papers explorer