OPHSD uses harness-augmented models as teachers to distill reasoning capabilities into base LLMs, yielding strong standalone performance on classification and math tasks.
On-policy distillation of language models: Learning from self- generated mistakes
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 2
citation-polarity summary
years
2026 2roles
background 2polarities
background 2representative citing papers
citing papers explorer
-
Training with Harnesses: On-Policy Harness Self-Distillation for Complex Reasoning
OPHSD uses harness-augmented models as teachers to distill reasoning capabilities into base LLMs, yielding strong standalone performance on classification and math tasks.
- The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes