Pathwise uniform value in gambling houses and Partially Observable Markov Decision Processes
classification
🧮 math.OC
keywords
valueaveragedecision-makerepsilongamblinghousespathwisesigma
read the original abstract
In several standard models of dynamic programming (gambling houses, MDPs, POMDPs), we prove the existence of a very robust notion of value for the infinitely repeated problem, namely the pathwise uniform value. This solves two open problems. First, this shows that for any epsilon>0, the decision-maker has a pure strategy sigma which is epsilon-optimal in any n-stage game, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use randomization). Second, the strategy sigma can be chosen such that under the long-run average payoff criterion (expectation of the liminf of the average payoffs), the decision-maker has more than lim v(n)-epsilon.
This paper has not been read by Pith yet.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.