2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024 , pages=

On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization , author= · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment

cs.AI · 2026-05-20 · conditional · novelty 7.0

DPO-RLHF equivalence holds only conditionally on the optimal policy preferring human-preferred responses; otherwise DPO optimizes relative advantage and can prefer worse outputs, addressed by introducing CPO.

citing papers explorer

Showing 1 of 1 citing paper.

Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment cs.AI · 2026-05-20 · conditional · none · ref 26
DPO-RLHF equivalence holds only conditionally on the optimal policy preferring human-preferred responses; otherwise DPO optimizes relative advantage and can prefer worse outputs, addressed by introducing CPO.

2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024 , pages=

fields

years

verdicts

representative citing papers

citing papers explorer