Augment-Reinforce-Merge Policy Gradient for Binary Stochastic Policy

Mingyuan Zhou; Mingzhang Yin; Yunhao Tang

arxiv: 1903.05284 · v1 · pith:GJCHRL7Onew · submitted 2019-03-13 · 💻 cs.LG · cs.AI· stat.ML

Augment-Reinforce-Merge Policy Gradient for Binary Stochastic Policy

Yunhao Tang , Mingzhang Yin , Mingyuan Zhou This is my paper

classification 💻 cs.LG cs.AIstat.ML

keywords policygradientestimatoraugment-reinforce-mergebinaryvarianceachievesaction

0 comments

read the original abstract

Due to the high variance of policy gradients, on-policy optimization algorithms are plagued with low sample efficiency. In this work, we propose Augment-Reinforce-Merge (ARM) policy gradient estimator as an unbiased low-variance alternative to previous baseline estimators on tasks with binary action space, inspired by the recent ARM gradient estimator for discrete random variable models. We show that the ARM policy gradient estimator achieves variance reduction with theoretical guarantees, and leads to significantly more stable and faster convergence of policies parameterized by neural networks.

This paper has not been read by Pith yet.

Augment-Reinforce-Merge Policy Gradient for Binary Stochastic Policy

discussion (0)