DGPO aggregates supervision at the group level with direction-aware multi-candidate comparisons to improve LLM alignment, delivering up to 3.6% average accuracy gains over baselines.
Samyam Rajbhandari, Olatunji Ruwase, Jeff Rasley, and Yuxiong He
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
DGPO: Beyond Pairwise Preferences with Directional Consistent Groupwise Optimization
DGPO aggregates supervision at the group level with direction-aware multi-candidate comparisons to improve LLM alignment, delivering up to 3.6% average accuracy gains over baselines.