Sequential DPO produces varied effects on prior preferences (partial degradation, stability, pair-level redistribution, or positive transfer) depending on objective relationships rather than uniform forgetting.
SimPO: Simple preference optimization with a reference-free reward, in: Advances in Neural Information Processing Systems
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
EPPC-OASIS combines ontology-aware fine-tuning via Wasserstein alignment with structured inference refinement to extract EPPC codes from secure messages, reporting 77.13% Code+Sub-code F1 and 63.83% Triplet F1 with small gains over supervised fine-tuning baselines.
citing papers explorer
-
Beyond Uniform Forgetting: A Study of Sequential Direct Preference Optimization Across Preference Settings
Sequential DPO produces varied effects on prior preferences (partial degradation, stability, pair-level redistribution, or positive transfer) depending on objective relationships rather than uniform forgetting.