Posterior Sampling for Large Scale Reinforcement Learning

Georgios Theocharous; Nikos Vlassis; Yasin Abbasi-Yadkori; Zheng Wen

arxiv: 1711.07979 · v3 · pith:UXHSCFCOnew · submitted 2017-11-21 · 💻 cs.LG · cs.AI

Posterior Sampling for Large Scale Reinforcement Learning

Georgios Theocharous , Zheng Wen , Yasin Abbasi-Yadkori , Nikos Vlassis This is my paper

classification 💻 cs.LG cs.AI

keywords algorithmpsrlproblemsalgorithmsassumptionscontinuousdeterministiclarge

0 comments

read the original abstract

We propose a practical non-episodic PSRL algorithm that unlike recent state-of-the-art PSRL algorithms uses a deterministic, model-independent episode switching schedule. Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity. We prove a Bayesian regret bound under mild assumptions. Our result is more generally applicable to multiple parameters and continuous state action problems. We compare our algorithm with state-of-the-art PSRL algorithms on standard discrete and continuous problems from the literature. Finally, we show how the assumptions of our algorithm satisfy a sensible parametrization for a large class of problems in sequential recommendations.

This paper has not been read by Pith yet.

Posterior Sampling for Large Scale Reinforcement Learning

discussion (0)