Energy-based preference model offers better offline alignment than the bradley-terry preference model.arXiv preprint arXiv:2412.13862

Yuzhong Hong, Hanshan Zhang, Junwei Bao, Hongfei Jiang, Yang Song · arXiv 2412.13862

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Relative Density Ratio Optimization for Stable and Statistically Consistent Model Alignment

cs.LG · 2026-04-06 · unverdicted · novelty 6.0

Relative density ratio optimization stabilizes direct density ratio estimation for language model alignment while preserving statistical consistency without assuming a Bradley-Terry preference model.

citing papers explorer

Showing 1 of 1 citing paper.

Relative Density Ratio Optimization for Stable and Statistically Consistent Model Alignment cs.LG · 2026-04-06 · unverdicted · none · ref 8
Relative density ratio optimization stabilizes direct density ratio estimation for language model alignment while preserving statistical consistency without assuming a Bradley-Terry preference model.

Energy-based preference model offers better offline alignment than the bradley-terry preference model.arXiv preprint arXiv:2412.13862

fields

years

verdicts

representative citing papers

citing papers explorer