On- line preference alignment for language models via count- based exploration.arXiv preprint arXiv:2501.12735,

Bai, C · arXiv 2501.12735

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

DRIFT achieves multi-turn RL performance via offline importance-weighted SFT by leveraging the equivalence of KL-regularized RL to weighted supervised learning.

citing papers explorer

Showing 1 of 1 citing paper.

DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization cs.LG · 2026-05-29 · unverdicted · none · ref 1
DRIFT achieves multi-turn RL performance via offline importance-weighted SFT by leveraging the equivalence of KL-regularized RL to weighted supervised learning.

On- line preference alignment for language models via count- based exploration.arXiv preprint arXiv:2501.12735,

fields

years

verdicts

representative citing papers

citing papers explorer