Fastest Convergence for Q-learning

Adithya M. Devraj; Sean P. Meyn

arxiv: 1707.03770 · v2 · pith:AT5KM4FGnew · submitted 2017-07-12 · 💻 cs.SY · cs.LG· math.OC

Fastest Convergence for Q-learning

Adithya M. Devraj , Sean P. Meyn This is my paper

classification 💻 cs.SY cs.LGmath.OC

keywords algorithmalgorithmsanalysisconvergenceevennon-idealq-learningsuggests

0 comments

read the original abstract

The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins' original algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed so that its asymptotic variance is optimal. Moreover, an ODE analysis suggests that the transient behavior is a close match to a deterministic Newton-Raphson implementation. This is made possible by a two time-scale update equation for the matrix gain sequence. The analysis suggests that the approach will lead to stable and efficient computation even for non-ideal parameterized settings. Numerical experiments confirm the quick convergence, even in such non-ideal cases. A secondary goal of this paper is tutorial. The first half of the paper contains a survey on reinforcement learning algorithms, with a focus on minimum variance algorithms.

This paper has not been read by Pith yet.

Fastest Convergence for Q-learning

discussion (0)