pith. sign in

What matters in data for dpo?

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

years

2026 3

verdicts

UNVERDICTED 3

roles

background 1

polarities

background 1

clear filters

representative citing papers

Which Pairs to Compare for LLM Post-Training?

cs.AI · 2026-06-17 · unverdicted · novelty 7.0

Matching upper and lower bounds on DPO policy optimality gap are derived that depend on a single design-dependent information matrix linking pair selection to estimation error and suboptimality.

$\xi$-DPO: Direct Preference Optimization via Ratio Reward Margin

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

ξ-DPO rewrites the preference objective as minimizing distance to optimal margins and defines reward as a chosen-to-rejected ratio, yielding a bounded, interpretable margin ξ set directly from the initial reward-gap distribution.

citing papers explorer

Showing 3 of 3 citing papers after filters.

  • Which Pairs to Compare for LLM Post-Training? cs.AI · 2026-06-17 · unverdicted · none · ref 15

    Matching upper and lower bounds on DPO policy optimality gap are derived that depend on a single design-dependent information matrix linking pair selection to estimation error and suboptimality.

  • Vision-driven Preference Synthesis for Mitigating Hallucinations in VLMs cs.CV · 2026-06-24 · unverdicted · none · ref 45

    ViPSy constructs policy-aligned and visually grounded preference pairs for VLMs via visual cues from image variants, yielding SOTA hallucination reductions of 35.7% on AMBER and 24.5% on Object HalBench.

  • $\xi$-DPO: Direct Preference Optimization via Ratio Reward Margin cs.LG · 2026-05-09 · unverdicted · none · ref 23

    ξ-DPO rewrites the preference objective as minimizing distance to optimal margins and defines reward as a chosen-to-rejected ratio, yielding a bounded, interpretable margin ξ set directly from the initial reward-gap distribution.