Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models

Akhil Agnihotri; Deepak Ramachandran; Rahul Jain; Zheng Wen

arxiv: 2505.10892 · v2 · pith:HFSUFVQXnew · submitted 2025-05-16 · 💻 cs.LG

Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models

Akhil Agnihotri , Rahul Jain , Deepak Ramachandran , Zheng Wen This is my paper

classification 💻 cs.LG

keywords optimizationpreferencealignmentmopomulti-objectiveobjectivesmodelsobjective

0 comments

read the original abstract

Post-training LLMs with RLHF and preference optimization methods (e.g., DPO, IPO) has greatly improved alignment, yet these approaches assume a single objective. In reality, humans express multiple, often conflicting objectives, such as helpfulness and harmlessness, with no natural scalarization. We study the multi-objective preference alignment problem, where a policy must balance several objectives simultaneously. We propose Multi-Objective Preference Optimization (MOPO), a constrained KL-regularized framework that maximizes a primary objective while enforcing lower bounds on secondary objectives via tunable safety thresholds. MOPO operates directly on pairwise preferences without point-wise rewards, and admits simple closed-form iterative updates. Empirically, MOPO recovers Pareto-optimal policies on synthetic benchmarks and, when fine-tuned on human-preference data, yields multi-billion parameter models that achieve higher rewards and Pareto-dominate baselines, with stable and robust optimization dynamics.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MGDA-Decoupled: Geometry-Aware Multi-Objective Optimisation for DPO-based LLM Alignment
cs.LG 2026-04 unverdicted novelty 6.0

MGDA-Decoupled applies geometry-based multi-objective optimization within the DPO framework to find shared descent directions that account for each objective's convergence dynamics, yielding higher win rates on UltraFeedback.