Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters

Aldo Faisal; Aniruddh Raghu; Emma Brunskill; Finale Doshi-Velez; Matthieu Komorowski; Omer Gottesman; Yao Liu

arxiv: 1807.01066 · v2 · pith:LWORRY3Pnew · submitted 2018-07-03 · 💻 cs.LG · stat.ML

Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters

Aniruddh Raghu , Omer Gottesman , Yao Liu , Matthieu Komorowski , Aldo Faisal , Finale Doshi-Velez , Emma Brunskill This is my paper

classification 💻 cs.LG stat.ML

keywords policybehaviourmodelscalibrationestimatedestimatesevaluationoff-policy

0 comments

read the original abstract

In this work, we consider the problem of estimating a behaviour policy for use in Off-Policy Policy Evaluation (OPE) when the true behaviour policy is unknown. Via a series of empirical studies, we demonstrate how accurate OPE is strongly dependent on the calibration of estimated behaviour policy models: how precisely the behaviour policy is estimated from data. We show how powerful parametric models such as neural networks can result in highly uncalibrated behaviour policy models on a real-world medical dataset, and illustrate how a simple, non-parametric, k-nearest neighbours model produces better calibrated behaviour policy estimates and can be used to obtain superior importance sampling-based OPE estimates.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
cs.LG 2020-05 unverdicted novelty 2.0

Offline RL promises to extract high-utility policies from static datasets but faces fundamental challenges that current methods only partially address.