A Additional Derivation Details This appendix collects the full forms of the OPD/G- OPD/GRPO expressions referenced in §3 and the implementation formulas referenced in §4

Model extrapolation expedites alignment · 2024 · arXiv 2404.16792

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

SG-OPD: Sign-Gated On-Policy Distillation via Sign-Consistency Gating and Phased Teacher Sampling

cs.CL · 2026-06-08 · unverdicted · novelty 4.0

SG-OPD adds sign-consistency gating and phased teacher sampling to on-policy distillation, reporting average gains of 1.98 per sample and 7.50 per question over standard OPD on math benchmarks.

Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities

cs.LG · 2024-08-14 · accept · novelty 4.0

The paper introduces a new taxonomy for model merging methods and reviews their applications in LLMs, MLLMs, continual learning, multi-task learning, and other subfields while outlining open challenges.

citing papers explorer

Showing 1 of 1 citing paper after filters.

SG-OPD: Sign-Gated On-Policy Distillation via Sign-Consistency Gating and Phased Teacher Sampling cs.CL · 2026-06-08 · unverdicted · none · ref 14
SG-OPD adds sign-consistency gating and phased teacher sampling to on-policy distillation, reporting average gains of 1.98 per sample and 7.50 per question over standard OPD on math benchmarks.

A Additional Derivation Details This appendix collects the full forms of the OPD/G- OPD/GRPO expressions referenced in §3 and the implementation formulas referenced in §4

fields

years

verdicts

representative citing papers

citing papers explorer