Samyam Rajbhandari, Olatunji Ruwase, Jeff Rasley, and Yuxiong He

Direct preference optimization: Your language model is secretly a reward model · 2020

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

DGPO: Beyond Pairwise Preferences with Directional Consistent Groupwise Optimization

cs.CL · 2026-05-11 · unverdicted · novelty 4.0

DGPO aggregates supervision at the group level with direction-aware multi-candidate comparisons to improve LLM alignment, delivering up to 3.6% average accuracy gains over baselines.

citing papers explorer

Showing 1 of 1 citing paper.

DGPO: Beyond Pairwise Preferences with Directional Consistent Groupwise Optimization cs.CL · 2026-05-11 · unverdicted · none · ref 4
DGPO aggregates supervision at the group level with direction-aware multi-candidate comparisons to improve LLM alignment, delivering up to 3.6% average accuracy gains over baselines.

Samyam Rajbhandari, Olatunji Ruwase, Jeff Rasley, and Yuxiong He

fields

years

verdicts

representative citing papers

citing papers explorer