POPO uses bounded importance sampling on positive rollouts and a siamese policy network to achieve implicit negative gradients and stable optimization, matching or exceeding GRPO on math benchmarks such as 36.67% on AIME 2025.
Biometrics bulletin , volume=
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Introduces anytime-valid e-processes for first- and higher-order stochastic dominance that achieve power one and remain valid under continuous monitoring.
Tabular foundation models outperform standard methods in credit risk PD and LGD tasks, with larger gains on smaller datasets when used out-of-the-box.
citing papers explorer
-
Beyond Negative Rollouts: Positive-Only Policy Optimization with Implicit Negative Gradients
POPO uses bounded importance sampling on positive rollouts and a siamese policy network to achieve implicit negative gradients and stable optimization, matching or exceeding GRPO on math benchmarks such as 36.67% on AIME 2025.
-
Betting on Bets: Anytime-Valid Tests for Stochastic Dominance
Introduces anytime-valid e-processes for first- and higher-order stochastic dominance that achieve power one and remain valid under continuous monitoring.
-
Foundation Models for Credit Risk Prediction: A Game Changer?
Tabular foundation models outperform standard methods in credit risk PD and LGD tasks, with larger gains on smaller datasets when used out-of-the-box.