On a convergent off -policy temporal difference learning algorithm in on-line learning environment

Prasenjit Karmakar; Rajkumar Maity; Shalabh Bhatnagar

arxiv: 1605.06076 · v1 · pith:KQ2C2RURnew · submitted 2016-05-19 · 💻 cs.LG

On a convergent off -policy temporal difference learning algorithm in on-line learning environment

Prasenjit Karmakar , Rajkumar Maity , Shalabh Bhatnagar This is my paper

classification 💻 cs.LG

keywords learningalgorithmdifferenceenvironmentlinearpolicyresultstemporal

0 comments

read the original abstract

In this paper we provide a rigorous convergence analysis of a "off"-policy temporal difference learning algorithm with linear function approximation and per time-step linear computational complexity in "online" learning environment. The algorithm considered here is TDC with importance weighting introduced by Maei et al. We support our theoretical results by providing suitable empirical results for standard off-policy counterexamples.

This paper has not been read by Pith yet.

On a convergent off -policy temporal difference learning algorithm in on-line learning environment

discussion (0)