SG-OPD adds sign-consistency gating and phased teacher sampling to on-policy distillation, reporting average gains of 1.98 per sample and 7.50 per question over standard OPD on math benchmarks.
A Additional Derivation Details This appendix collects the full forms of the OPD/G- OPD/GRPO expressions referenced in §3 and the implementation formulas referenced in §4
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
The paper introduces a new taxonomy for model merging methods and reviews their applications in LLMs, MLLMs, continual learning, multi-task learning, and other subfields while outlining open challenges.
citing papers explorer
-
SG-OPD: Sign-Gated On-Policy Distillation via Sign-Consistency Gating and Phased Teacher Sampling
SG-OPD adds sign-consistency gating and phased teacher sampling to on-policy distillation, reporting average gains of 1.98 per sample and 7.50 per question over standard OPD on math benchmarks.