pith. sign in

arxiv: 1505.07495 · v2 · pith:2S5ENRW6new · submitted 2015-05-27 · 🧮 math.OC

Pathwise uniform value in gambling houses and Partially Observable Markov Decision Processes

classification 🧮 math.OC
keywords valueaveragedecision-makerepsilongamblinghousespathwisesigma
0
0 comments X
read the original abstract

In several standard models of dynamic programming (gambling houses, MDPs, POMDPs), we prove the existence of a very robust notion of value for the infinitely repeated problem, namely the pathwise uniform value. This solves two open problems. First, this shows that for any epsilon>0, the decision-maker has a pure strategy sigma which is epsilon-optimal in any n-stage game, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use randomization). Second, the strategy sigma can be chosen such that under the long-run average payoff criterion (expectation of the liminf of the average payoffs), the decision-maker has more than lim v(n)-epsilon.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.