Optimizing Language Models with Fair and Stable Reward Composition in Reinforcement Learning , booktitle =

Jiahui Li, Hanlin Zhang, Fengda Zhang, Tai · 2024 · DOI 10.18653/v1/2024.emnlp-main.565

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open at publisher browse 1 citing papers

representative citing papers

MGDA-Decoupled: Geometry-Aware Multi-Objective Optimisation for DPO-based LLM Alignment

cs.LG · 2026-04-22 · unverdicted · novelty 6.0

MGDA-Decoupled applies geometry-based multi-objective optimization within the DPO framework to find shared descent directions that account for each objective's convergence dynamics, yielding higher win rates on UltraFeedback.

citing papers explorer

Showing 1 of 1 citing paper.

MGDA-Decoupled: Geometry-Aware Multi-Objective Optimisation for DPO-based LLM Alignment cs.LG · 2026-04-22 · unverdicted · none · ref 50
MGDA-Decoupled applies geometry-based multi-objective optimization within the DPO framework to find shared descent directions that account for each objective's convergence dynamics, yielding higher win rates on UltraFeedback.

Optimizing Language Models with Fair and Stable Reward Composition in Reinforcement Learning , booktitle =

fields

years

verdicts

representative citing papers

citing papers explorer