A Sliding-Window Algorithm for Markov Decision Processes with Arbitrarily Changing Rewards and Transitions

Peter Auer; Pratik Gajane; Ronald Ortner

arxiv: 1805.10066 · v1 · pith:AQJ25OP3new · submitted 2018-05-25 · 💻 cs.LG · stat.ML

A Sliding-Window Algorithm for Markov Decision Processes with Arbitrarily Changing Rewards and Transitions

Pratik Gajane , Ronald Ortner , Peter Auer This is my paper

classification 💻 cs.LG stat.ML

keywords algorithmchangingdecisionmarkovoptimalprocessesresultswindow

0 comments

read the original abstract

We consider reinforcement learning in changing Markov Decision Processes where both the state-transition probabilities and the reward functions may vary over time. For this problem setting, we propose an algorithm using a sliding window approach and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. We also characterize the optimal window size suitable for our algorithm. These results are complemented by a sample complexity bound on the number of sub-optimal steps taken by the algorithm. Finally, we present some experimental results to support our theoretical analysis.

This paper has not been read by Pith yet.

A Sliding-Window Algorithm for Markov Decision Processes with Arbitrarily Changing Rewards and Transitions

discussion (0)