pith. sign in

arxiv: 2110.13523 · v2 · pith:A4PUTKCPnew · submitted 2021-10-26 · 💻 cs.LG · cs.AI· cs.RO· stat.ML

Automating Control of Overestimation Bias for Reinforcement Learning

classification 💻 cs.LG cs.AIcs.ROstat.ML
keywords biascontrolalgorithmshyperparameterslearningoverestimationreinforcementtechniques
0
0 comments X
read the original abstract

Overestimation bias control techniques are used by the majority of high-performing off-policy reinforcement learning algorithms. However, most of these techniques rely on pre-defined bias correction policies that are either not flexible enough or require environment-specific tuning of hyperparameters. In this work, we present a general data-driven approach for the automatic selection of bias control hyperparameters. We demonstrate its effectiveness on three algorithms: Truncated Quantile Critics, Weighted Delayed DDPG, and Maxmin Q-learning. The proposed technique eliminates the need for an extensive hyperparameter search. We show that it leads to a significant reduction of the actual number of interactions while preserving the performance.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.