Semi-DPO applies semi-supervised learning to noisy preference data in diffusion DPO by training first on consensus pairs then iteratively pseudo-labeling conflicts, yielding state-of-the-art alignment with complex human preferences.
6.3 THEUSE OFLARGELANGUAGEMODELS Large Language Models (LLMs) such as GPT were used solely for language polishing and clarity improvements in the writing of this paper
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Learning from Noisy Preferences: A Semi-Supervised Learning Approach to Direct Preference Optimization
Semi-DPO applies semi-supervised learning to noisy preference data in diffusion DPO by training first on consensus pairs then iteratively pseudo-labeling conflicts, yielding state-of-the-art alignment with complex human preferences.