pith. sign in

arxiv: 1807.03223 · v3 · pith:NIMLG5BLnew · submitted 2018-07-09 · 🧮 math.OC · cs.AI

Entropy Maximization for Markov Decision Processes Under Temporal Logic Constraints

classification 🧮 math.OC cs.AI
keywords entropypathslogictemporalmaximizesmaximumpolicyspecification
0
0 comments X
read the original abstract

We study the problem of synthesizing a policy that maximizes the entropy of a Markov decision process (MDP) subject to a temporal logic constraint. Such a policy minimizes the predictability of the paths it generates, or dually, maximizes the exploration of different paths in an MDP while ensuring the satisfaction of a temporal logic specification. We first show that the maximum entropy of an MDP can be finite, infinite or unbounded. We provide necessary and sufficient conditions under which the maximum entropy of an MDP is finite, infinite or unbounded. We then present an algorithm which is based on a convex optimization problem to synthesize a policy that maximizes the entropy of an MDP. We also show that maximizing the entropy of an MDP is equivalent to maximizing the entropy of the paths that reach a certain set of states in the MDP. Finally, we extend the algorithm to an MDP subject to a temporal logic specification. In numerical examples, we demonstrate the proposed method on different motion planning scenarios and illustrate the relation between the restrictions imposed on the paths by a specification, the maximum entropy, and the predictability of paths.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Entropy-Regularized Stochastic Games

    math.OC 2019-07 unverdicted novelty 6.0

    Entropy-regularized stochastic games are defined with proofs of value existence for N-stage and discounted cases, sufficiency of Markovian and stationary strategies, and convex optimization algorithms for computation.