← back to paper
arxiv: 2605.06474 · 2 revisions
Q-MMR: Off-Policy Evaluation via Recursive Reweighting and Moment Matching