Title resolution pending

13 A Detailed Math Derivations Here, we make mathematical derivations to calculate the expected gradients of OPD objective in Eq · 2048

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation

cs.LG · 2026-02-12 · conditional · novelty 6.0

Generalized on-policy distillation with reward scaling above one (ExOPD) lets student models surpass teacher performance when merging domain experts on math and code tasks.

citing papers explorer

Showing 1 of 1 citing paper.

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation cs.LG · 2026-02-12 · conditional · none · ref 26
Generalized on-policy distillation with reward scaling above one (ExOPD) lets student models surpass teacher performance when merging domain experts on math and code tasks.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer