← back to paper
arxiv: 2605.12667 · 2 revisions
ODRPO: Ordinal Decompositions of Discrete Rewards for Robust Policy Optimization