Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert · 2023

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

MGDA-Decoupled: Geometry-Aware Multi-Objective Optimisation for DPO-based LLM Alignment

cs.LG · 2026-04-22 · unverdicted · novelty 6.0

MGDA-Decoupled applies geometry-based multi-objective optimization within the DPO framework to find shared descent directions that account for each objective's convergence dynamics, yielding higher win rates on UltraFeedback.

citing papers explorer

Showing 1 of 1 citing paper.

MGDA-Decoupled: Geometry-Aware Multi-Objective Optimisation for DPO-based LLM Alignment cs.LG · 2026-04-22 · unverdicted · none · ref 73
MGDA-Decoupled applies geometry-based multi-objective optimization within the DPO framework to find shared descent directions that account for each objective's convergence dynamics, yielding higher win rates on UltraFeedback.

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

fields

years

verdicts

representative citing papers

citing papers explorer