A K-fold Method for Baseline Estimation in Policy Gradient Algorithms

Abhishek Mishra; Nithyanand Kota; Pieter Abbeel; Sunil Srinivasa; Xi (Peter) Chen

arxiv: 1701.00867 · v1 · pith:743LAUG3new · submitted 2017-01-03 · 💻 cs.AI

A K-fold Method for Baseline Estimation in Policy Gradient Algorithms

Nithyanand Kota , Abhishek Mishra , Sunil Srinivasa , Xi (Peter) Chen , Pieter Abbeel This is my paper

classification 💻 cs.AI

keywords baselinealgorithmsestimationgradientpolicyk-foldmethodadding

0 comments

read the original abstract

The high variance issue in unbiased policy-gradient methods such as VPG and REINFORCE is typically mitigated by adding a baseline. However, the baseline fitting itself suffers from the underfitting or the overfitting problem. In this paper, we develop a K-fold method for baseline estimation in policy gradient algorithms. The parameter K is the baseline estimation hyperparameter that can adjust the bias-variance trade-off in the baseline estimates. We demonstrate the usefulness of our approach via two state-of-the-art policy gradient algorithms on three MuJoCo locomotive control tasks.

This paper has not been read by Pith yet.

A K-fold Method for Baseline Estimation in Policy Gradient Algorithms

discussion (0)