Classical Policy Gradient: Preserving Bellman's Principle of Optimality
classification
💻 cs.LG
stat.ML
keywords
bellmangradientobjectiveoptimalityprinciplebettercapturesclassical
read the original abstract
We propose a new objective function for finite-horizon episodic Markov decision processes that better captures Bellman's principle of optimality, and provide an expression for the gradient of the objective.
This paper has not been read by Pith yet.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.