Beyond one-preference-fits-all alignment: Multi-objective direct preference optimization

Zhanhui Zhou, Jie Liu, Jing Dong, Jiaheng Yang, et al · 2023 · arXiv 2310.03708

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1 baseline 1

citation-polarity summary

background 1 baseline 1

representative citing papers

VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions

cs.AI · 2026-05-26 · unverdicted · novelty 7.0

VitaBench 2.0 introduces a benchmark for long-term personalized and proactive agent behavior, with results indicating substantial gaps in current frontier LLMs.

Explaining and Breaking the Safety-Helpfulness Ceiling via Preference Dimensional Expansion

cs.AI · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

MORA breaks the safety-helpfulness ceiling in LLMs by pre-sampling single-reward prompts and rewriting them to incorporate multi-dimensional intents, delivering 5-12.4% gains in sequential alignment and 4.6% overall improvement in simultaneous alignment.

Improving Video Generation with Human Feedback

cs.CV · 2025-01-23 · unverdicted · novelty 6.0

A human preference dataset and VideoReward model enable Flow-DPO and Flow-NRG to produce smoother, better-aligned videos from text prompts in flow-based generators.

Improving Inverse Folding for Peptide Design with Diversity-regularized Direct Preference Optimization

cs.LG · 2024-10-25 · unverdicted · novelty 6.0

Diversity-regularized DPO fine-tuning of ProteinMPNN improves structural similarity scores by at least 8% over base model and sequence diversity by up to 20% over standard DPO for peptide inverse folding on OpenFold structures.

Enhancing Speech Large Language Models through Reinforced Behavior Alignment

cs.CL · 2025-08-25 · unverdicted · novelty 5.0

Reinforced Behavior Alignment (RBA) uses self-synthesized data from a teacher LLM and reinforcement learning to close the instruction-following gap in SpeechLMs, outperforming distillation and reaching SOTA on spoken QA and speech-to-text translation benchmarks.

citing papers explorer

Showing 5 of 5 citing papers after filters.

VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions cs.AI · 2026-05-26 · unverdicted · none · ref 32
VitaBench 2.0 introduces a benchmark for long-term personalized and proactive agent behavior, with results indicating substantial gaps in current frontier LLMs.
Explaining and Breaking the Safety-Helpfulness Ceiling via Preference Dimensional Expansion cs.AI · 2026-05-12 · unverdicted · none · ref 40 · 2 links
MORA breaks the safety-helpfulness ceiling in LLMs by pre-sampling single-reward prompts and rewriting them to incorporate multi-dimensional intents, delivering 5-12.4% gains in sequential alignment and 4.6% overall improvement in simultaneous alignment.
Improving Video Generation with Human Feedback cs.CV · 2025-01-23 · unverdicted · none · ref 90
A human preference dataset and VideoReward model enable Flow-DPO and Flow-NRG to produce smoother, better-aligned videos from text prompts in flow-based generators.
Improving Inverse Folding for Peptide Design with Diversity-regularized Direct Preference Optimization cs.LG · 2024-10-25 · unverdicted · none · ref 44
Diversity-regularized DPO fine-tuning of ProteinMPNN improves structural similarity scores by at least 8% over base model and sequence diversity by up to 20% over standard DPO for peptide inverse folding on OpenFold structures.
Enhancing Speech Large Language Models through Reinforced Behavior Alignment cs.CL · 2025-08-25 · unverdicted · none · ref 62
Reinforced Behavior Alignment (RBA) uses self-synthesized data from a teacher LLM and reinforcement learning to close the instruction-following gap in SpeechLMs, outperforming distillation and reaching SOTA on spoken QA and speech-to-text translation benchmarks.

Beyond one-preference-fits-all alignment: Multi-objective direct preference optimization

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer