DGPO aggregates supervision at the group level with direction-aware multi-candidate comparisons to improve LLM alignment, delivering up to 3.6% average accuracy gains over baselines.
Find the remainder when the largest three- digit palindrome (999) is divided by this num- ber
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
DGPO: Beyond Pairwise Preferences with Directional Consistent Groupwise Optimization
DGPO aggregates supervision at the group level with direction-aware multi-candidate comparisons to improve LLM alignment, delivering up to 3.6% average accuracy gains over baselines.