Stochastic Variance-Reduced Policy Gradient

Damiano Binaghi; Giuseppe Canonaco; Marcello Restelli; Matteo Papini; Matteo Pirotta

arxiv: 1806.05618 · v1 · pith:WXSM7OQYnew · submitted 2018-06-14 · 💻 cs.LG · stat.ML

Stochastic Variance-Reduced Policy Gradient

Matteo Papini , Damiano Binaghi , Giuseppe Canonaco , Matteo Pirotta , Marcello Restelli This is my paper

classification 💻 cs.LG stat.ML

keywords gradientpolicystochasticsvrpgvariance-reducedalgorithmconvergencelearning

0 comments

read the original abstract

In this paper, we propose a novel reinforcement- learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning. However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-concave objective func- tion; II) approximations in the full gradient com- putation; and III) a non-stationary sampling pro- cess. The result is SVRPG, a stochastic variance- reduced policy gradient algorithm that leverages on importance weights to preserve the unbiased- ness of the gradient estimate. Under standard as- sumptions on the MDP, we provide convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes. Finally, we suggest practical variants of SVRPG, and we empirically evaluate them on continuous MDPs.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Reinforcement Learning-based Control via Y-wise Affine Neural Networks (YANNs)
eess.SY 2025-08 unverdicted novelty 6.0

YANN-RL initializes RL actor and critic networks with explicit multi-parametric linear MPC solutions via YANNs to start from linear optimal control performance and then learn nonlinear policies through online interaction.