pith. sign in

← back to paper

Review history

arxiv: 2605.05863 · 2 revisions

SOPE: Stabilizing Off-Policy Evaluation for Online RL with Prior Data

  1. 2026-05-21 CONDITIONAL LOW v0.9.0 novelty 6.0
    54918 ms 5735 in 1214 out 2026-05-21T09:09:56.890520+00:00
  2. 2026-05-08 UNVERDICTED LOW v0.9.0 novelty 7.0
    41725 ms 5504 in 1262 out 2026-05-08T14:50:37.993640+00:00