Evaluating the Robustness of Off-Policy Evaluation

Haruka Kiyohara; Kazuki Mogi; Kei Tateno; Takuma Udagawa; Yusuke Narita; Yuta Saito

arxiv: 2108.13703 · v1 · pith:KQWWPGGSnew · submitted 2021-08-31 · 📊 stat.ML · cs.AI· cs.LG

Evaluating the Robustness of Off-Policy Evaluation

Yuta Saito , Takuma Udagawa , Haruka Kiyohara , Kazuki Mogi , Yusuke Narita , Kei Tateno This is my paper

classification 📊 stat.ML cs.AIcs.LG

keywords evaluationestimatorsprocedurehyperparametersieoeofflinepoliciesrobustness

0 comments

read the original abstract

Off-policy Evaluation (OPE), or offline evaluation in general, evaluates the performance of hypothetical policies leveraging only offline log data. It is particularly useful in applications where the online interaction involves high stakes and expensive setting such as precision medicine and recommender systems. Since many OPE estimators have been proposed and some of them have hyperparameters to be tuned, there is an emerging challenge for practitioners to select and tune OPE estimators for their specific application. Unfortunately, identifying a reliable estimator from results reported in research papers is often difficult because the current experimental procedure evaluates and compares the estimators' performance on a narrow set of hyperparameters and evaluation policies. Therefore, it is difficult to know which estimator is safe and reliable to use. In this work, we develop Interpretable Evaluation for Offline Evaluation (IEOE), an experimental procedure to evaluate OPE estimators' robustness to changes in hyperparameters and/or evaluation policies in an interpretable manner. Then, using the IEOE procedure, we perform extensive evaluation of a wide variety of existing estimators on Open Bandit Dataset, a large-scale public real-world dataset for OPE. We demonstrate that our procedure can evaluate the estimators' robustness to the hyperparamter choice, helping us avoid using unsafe estimators. Finally, we apply IEOE to real-world e-commerce platform data and demonstrate how to use our protocol in practice.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Logging Policy Design for Off-Policy Evaluation
stat.ML 2026-05 unverdicted novelty 7.0

Derives optimal logging policies for off-policy evaluation by balancing reward concentration against action coverage in known, unknown, and partially known regimes of target policy and rewards.
Logging Policy Design for Off-Policy Evaluation
stat.ML 2026-05 unverdicted novelty 5.0

Derives optimal logging policies for minimizing off-policy evaluation error under known, unknown, and partially known target policies and reward distributions.