EDGE-OPD adds guided rollouts and evidence masking to on-policy self-distillation, enabling successful learning of target identities where standard OPSD and RLSD fail.
Training language models to follow instructions with human feedback.Advances in Neural Information Processing Systems, 35:27730–27744
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
EDGE-OPD: Internalizing Privileged Context with Evidence Guided On-Policy Distillation
EDGE-OPD adds guided rollouts and evidence masking to on-policy self-distillation, enabling successful learning of target identities where standard OPSD and RLSD fail.