Insights into alignment: Evaluating dpo and its variants across multiple tasks

Amir Saeidi, Shivanshu Verma, Md Nayem Uddin, Chitta Baral · 2024 · arXiv 2404.14723

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

cs.LG · 2025-11-30 · unverdicted · novelty 5.0

Gradient analysis and ablations show DPO and PPO have different target directions and component roles in preference optimization for LLMs.

Showing 1 of 1 citing paper.

What Is Preference Optimization Doing, and Why? cs.LG · 2025-11-30 · unverdicted · none · ref 33
Gradient analysis and ablations show DPO and PPO have different target directions and component roles in preference optimization for LLMs.