On Applications of Bootstrap in Continuous Space Reinforcement Learning

Ambuj Tewari; George Michailidis; Mohamad Kazem Shirani Faradonbeh

arxiv: 1903.05803 · v2 · pith:WZHJJWGCnew · submitted 2019-03-14 · 💻 cs.LG · cs.SY· stat.ML

On Applications of Bootstrap in Continuous Space Reinforcement Learning

Mohamad Kazem Shirani Faradonbeh , Ambuj Tewari , George Michailidis This is my paper

classification 💻 cs.LG cs.SYstat.ML

keywords policieslearningapplicationscontinuouslinearreinforcementresultsaccuracy

0 comments

read the original abstract

In decision making problems for continuous state and action spaces, linear dynamical models are widely employed. Specifically, policies for stochastic linear systems subject to quadratic cost functions capture a large number of applications in reinforcement learning. Selected randomized policies have been studied in the literature recently that address the trade-off between identification and control. However, little is known about policies based on bootstrapping observed states and actions. In this work, we show that bootstrap-based policies achieve a square root scaling of regret with respect to time. We also obtain results on the accuracy of learning the model's dynamics. Corroborative numerical analysis that illustrates the technical results is also provided.

This paper has not been read by Pith yet.

On Applications of Bootstrap in Continuous Space Reinforcement Learning

discussion (0)