(2024), Reward Model Ensembles Help Mitigate Overoptimization, in The Twelfth International Conference on Learning Representations

Coste, T · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Reinforcement Learning from Human Feedback: A Statistical Perspective

stat.ML · 2026-04-02 · accept · novelty 2.0

A statistical survey of RLHF for LLM alignment that connects preference learning and policy optimization to models like Bradley-Terry-Luce while reviewing methods, extensions, and open challenges.

citing papers explorer

Showing 1 of 1 citing paper.

Reinforcement Learning from Human Feedback: A Statistical Perspective stat.ML · 2026-04-02 · accept · none · ref 18
A statistical survey of RLHF for LLM alignment that connects preference learning and policy optimization to models like Bradley-Terry-Luce while reviewing methods, extensions, and open challenges.

(2024), Reward Model Ensembles Help Mitigate Overoptimization, in The Twelfth International Conference on Learning Representations

fields

years

verdicts

representative citing papers

citing papers explorer