Proceedings of the Fourteenth International Conference on Learning Representations , year =

Fu, Yuqian, Chen, Tinghong, Chai, Jiajun, Wang, Xihuai, Tu, Songjun, Yin, Guojun

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Asymmetric On-Policy Distillation: Bridging Exploitation and Imitation at the Token Level

cs.LG · 2026-05-07 · unverdicted · novelty 7.0 · 3 refs

AOPD modifies on-policy distillation by using localized divergence minimization for non-positive advantages instead of negative reinforcement, yielding average gains of 4.09/8.34 over standard OPD on math reasoning benchmarks under strong/weak initialization.

citing papers explorer

Showing 1 of 1 citing paper.

Asymmetric On-Policy Distillation: Bridging Exploitation and Imitation at the Token Level cs.LG · 2026-05-07 · unverdicted · none · ref 27 · 3 links
AOPD modifies on-policy distillation by using localized divergence minimization for non-positive advantages instead of negative reinforcement, yielding average gains of 4.09/8.34 over standard OPD on math reasoning benchmarks under strong/weak initialization.

Proceedings of the Fourteenth International Conference on Learning Representations , year =

fields

years

verdicts

representative citing papers

citing papers explorer