pith. sign in

arxiv: 1705.03597 · v1 · pith:A26B3KL4new · submitted 2017-05-10 · 💻 cs.AI

Solving Multi-Objective MDP with Lexicographic Preference: An application to stochastic planning with multiple quantile objective

classification 💻 cs.AI
keywords policylexicographicmulti-objectivepreferencewhenaccumulatedagentapplications
0
0 comments X
read the original abstract

In most common settings of Markov Decision Process (MDP), an agent evaluate a policy based on expectation of (discounted) sum of rewards. However in many applications this criterion might not be suitable from two perspective: first, in risk aversion situation expectation of accumulated rewards is not robust enough, this is the case when distribution of accumulated reward is heavily skewed; another issue is that many applications naturally take several objective into consideration when evaluating a policy, for instance in autonomous driving an agent needs to balance speed and safety when choosing appropriate decision. In this paper, we consider evaluating a policy based on a sequence of quantiles it induces on a set of target states, our idea is to reformulate the original problem into a multi-objective MDP problem with lexicographic preference naturally defined. For computation of finding an optimal policy, we proposed an algorithm \textbf{FLMDP} that could solve general multi-objective MDP with lexicographic reward preference.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.