← back to paper
arxiv: 2605.05863 · 2 revisions
SOPE: Stabilizing Off-Policy Evaluation for Online RL with Prior Data