Classical Policy Gradient: Preserving Bellman's Principle of Optimality

Chris Nota; James Kostas; Philip S. Thomas; Scott M. Jordan; Yash Chandak

arxiv: 1906.03063 · v1 · pith:4HH4SYWMnew · submitted 2019-06-06 · 💻 cs.LG · stat.ML

Classical Policy Gradient: Preserving Bellman's Principle of Optimality

Philip S. Thomas , Scott M. Jordan , Yash Chandak , Chris Nota , James Kostas This is my paper

classification 💻 cs.LG stat.ML

keywords bellmangradientobjectiveoptimalityprinciplebettercapturesclassical

0 comments

read the original abstract

We propose a new objective function for finite-horizon episodic Markov decision processes that better captures Bellman's principle of optimality, and provide an expression for the gradient of the objective.

This paper has not been read by Pith yet.

Classical Policy Gradient: Preserving Bellman's Principle of Optimality

discussion (0)