Automating Control of Overestimation Bias for Reinforcement Learning

Alexander Grishin; Arsenii Ashukha; Arsenii Kuznetsov; Artem Tsypin; Artur Kadurin; Dmitry Vetrov

arxiv: 2110.13523 · v2 · pith:A4PUTKCPnew · submitted 2021-10-26 · 💻 cs.LG · cs.AI· cs.RO· stat.ML

Automating Control of Overestimation Bias for Reinforcement Learning

Arsenii Kuznetsov , Alexander Grishin , Artem Tsypin , Arsenii Ashukha , Artur Kadurin , Dmitry Vetrov This is my paper

classification 💻 cs.LG cs.AIcs.ROstat.ML

keywords biascontrolalgorithmshyperparameterslearningoverestimationreinforcementtechniques

0 comments

read the original abstract

Overestimation bias control techniques are used by the majority of high-performing off-policy reinforcement learning algorithms. However, most of these techniques rely on pre-defined bias correction policies that are either not flexible enough or require environment-specific tuning of hyperparameters. In this work, we present a general data-driven approach for the automatic selection of bias control hyperparameters. We demonstrate its effectiveness on three algorithms: Truncated Quantile Critics, Weighted Delayed DDPG, and Maxmin Q-learning. The proposed technique eliminates the need for an extensive hyperparameter search. We show that it leads to a significant reduction of the actual number of interactions while preserving the performance.

This paper has not been read by Pith yet.

Automating Control of Overestimation Bias for Reinforcement Learning

discussion (0)