pith. sign in

arXiv preprint arXiv:2501.12735 , year=

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

fields

cs.LG 3

years

2026 3

verdicts

UNVERDICTED 3

clear filters

representative citing papers

On Advantage Estimates for Max@K Policy Gradients

cs.LG · 2026-06-04 · unverdicted · novelty 6.0

Proposes MaxPO using a Leave-Two-Out baseline for centered unbiased advantages in max@K policy gradients, with a unified derivation of finite-batch estimators.

citing papers explorer

Showing 3 of 3 citing papers after filters.