Pathwise uniform value in gambling houses and Partially Observable Markov Decision Processes

Bruno Ziliotto; Xavier Venel

arxiv: 1505.07495 · v2 · pith:2S5ENRW6new · submitted 2015-05-27 · 🧮 math.OC

Pathwise uniform value in gambling houses and Partially Observable Markov Decision Processes

Xavier Venel , Bruno Ziliotto This is my paper

classification 🧮 math.OC

keywords valueaveragedecision-makerepsilongamblinghousespathwisesigma

0 comments

read the original abstract

In several standard models of dynamic programming (gambling houses, MDPs, POMDPs), we prove the existence of a very robust notion of value for the infinitely repeated problem, namely the pathwise uniform value. This solves two open problems. First, this shows that for any epsilon>0, the decision-maker has a pure strategy sigma which is epsilon-optimal in any n-stage game, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use randomization). Second, the strategy sigma can be chosen such that under the long-run average payoff criterion (expectation of the liminf of the average payoffs), the decision-maker has more than lim v(n)-epsilon.

This paper has not been read by Pith yet.

Pathwise uniform value in gambling houses and Partially Observable Markov Decision Processes

discussion (0)