pith. machine review for the scientific record. sign in

arxiv: 1510.09142 · v1 · submitted 2015-10-30 · 💻 cs.LG · cs.NE

Recognition: unknown

Learning Continuous Control Policies by Stochastic Value Gradients

Authors on Pith no claims yet
classification 💻 cs.LG cs.NE
keywords controlvaluecontinuousfunctionslearningpoliciesstochasticalgorithms
0
0 comments X
read the original abstract

We present a unified framework for learning continuous control policies using backpropagation. It supports stochastic control by treating stochasticity in the Bellman equation as a deterministic function of exogenous noise. The product is a spectrum of general policy gradient algorithms that range from model-free methods with value functions to model-based methods without value functions. We use learned models but only require observations from the environment in- stead of observations from model-predicted trajectories, minimizing the impact of compounded model errors. We apply these algorithms first to a toy stochastic control problem and then to several physics-based control problems in simulation. One of these variants, SVG(1), shows the effectiveness of learning models, value functions, and policies simultaneously in continuous domains.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. High-Dimensional Continuous Control Using Generalized Advantage Estimation

    cs.LG 2015-06 accept novelty 8.0

    Generalized advantage estimation combined with trust region optimization enables stable neural network policy learning for complex continuous control from raw kinematics.