pith. sign in

← back to paper

Review history

arxiv: 2602.06239 · 2 revisions

Provably avoiding over-optimization in Direct Preference Optimization without knowing the data distribution

  1. 2026-05-21 UNVERDICTED LOW v0.9.0 novelty 5.0
    29396 ms 5668 in 1247 out 2026-05-21T13:02:08.817489+00:00
  2. 2026-05-16 UNVERDICTED LOW v0.9.0 novelty 6.0
    24065 ms 5437 in 1386 out 2026-05-16T06:32:34.967797+00:00