Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

Prasenjit Karmakar; Shalabh Bhatnagar

arxiv: 1503.09105 · v14 · pith:SMGE4LPTnew · submitted 2015-03-31 · 🧮 math.DS · cs.AI· stat.ML

Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

Prasenjit Karmakar , Shalabh Bhatnagar This is my paper

classification 🧮 math.DS cs.AIstat.ML

keywords controlledmarkovnoiseapproximationdifferenceasymptoticconvergencelearning

0 comments

read the original abstract

We present for the first time an asymptotic convergence analysis of two time-scale stochastic approximation driven by `controlled' Markov noise. In particular, both the faster and slower recursions have non-additive controlled Markov noise components in addition to martingale difference noise. We analyze the asymptotic behavior of our framework by relating it to limiting differential inclusions in both time-scales that are defined in terms of the ergodic occupation measures associated with the controlled Markov processes. Finally, we present a solution to the off-policy convergence problem for temporal difference learning with linear function approximation, using our results.

This paper has not been read by Pith yet.

Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

discussion (0)