pith. sign in

hub

Is dpo superior to ppo for llm alignment? a comprehensive study

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

hub tools

citation-role summary

background 2 method 1

citation-polarity summary

verdicts

UNVERDICTED 11

representative citing papers

HybridFlow: A Flexible and Efficient RLHF Framework

cs.LG · 2024-09-28 · unverdicted · novelty 6.0

HybridFlow combines single- and multi-controller paradigms with a 3D-HybridEngine to deliver 1.53x to 20.57x higher throughput for various RLHF algorithms compared to prior systems.

citing papers explorer

Showing 11 of 11 citing papers.