Review history
Provably avoiding over-optimization in Direct Preference Optimization without knowing the data distribution
-
2026-05-21 UNVERDICTED
-
2026-05-16 UNVERDICTED
Provably avoiding over-optimization in Direct Preference Optimization without knowing the data distribution