It is easy to notice that for any value along the horizontal axis the two values sum up to1

The standard sigmoid is plotted in blue, σ(−x) = 1 −σ (x)is plotted with the dotted blue line · 2023

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Provably avoiding over-optimization in Direct Preference Optimization without knowing the data distribution

cs.LG · 2026-02-05 · unverdicted · novelty 5.0 · 2 refs

PEPO is a single-step pessimistic ensemble algorithm for direct preference optimization that provably avoids over-optimization by depending only on single-policy concentrability without knowing the data distribution or learning an explicit reward model.

citing papers explorer

Showing 1 of 1 citing paper.

Provably avoiding over-optimization in Direct Preference Optimization without knowing the data distribution cs.LG · 2026-02-05 · unverdicted · none · ref 42 · 2 links
PEPO is a single-step pessimistic ensemble algorithm for direct preference optimization that provably avoids over-optimization by depending only on single-policy concentrability without knowing the data distribution or learning an explicit reward model.

It is easy to notice that for any value along the horizontal axis the two values sum up to1

fields

years

verdicts

representative citing papers

citing papers explorer