Improved algorithms for linear stochastic bandits.Advances in neural information processing systems, 24

Yasin Abbasi-Yadkori, Dávid Pál, Csaba Szepesvári · 2011

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

MASS-DPO: Multi-negative Active Sample Selection for Direct Policy Optimization

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

MASS-DPO derives a Plackett-Luce-specific log-determinant Fisher information objective to select non-redundant negative samples, matching or exceeding multi-negative DPO performance with substantially fewer negatives across four benchmarks and three model families.

citing papers explorer

Showing 1 of 1 citing paper.

MASS-DPO: Multi-negative Active Sample Selection for Direct Policy Optimization cs.LG · 2026-05-11 · unverdicted · none · ref 1
MASS-DPO derives a Plackett-Luce-specific log-determinant Fisher information objective to select non-redundant negative samples, matching or exceeding multi-negative DPO performance with substantially fewer negatives across four benchmarks and three model families.

Improved algorithms for linear stochastic bandits.Advances in neural information processing systems, 24

fields

years

verdicts

representative citing papers

citing papers explorer