pith. sign in

arxiv: 1906.03063 · v1 · pith:4HH4SYWMnew · submitted 2019-06-06 · 💻 cs.LG · stat.ML

Classical Policy Gradient: Preserving Bellman's Principle of Optimality

classification 💻 cs.LG stat.ML
keywords bellmangradientobjectiveoptimalityprinciplebettercapturesclassical
0
0 comments X
read the original abstract

We propose a new objective function for finite-horizon episodic Markov decision processes that better captures Bellman's principle of optimality, and provide an expression for the gradient of the objective.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.