pith. sign in

arxiv: 1512.05509 · v1 · pith:ISE3X7VDnew · submitted 2015-12-17 · 💻 cs.NE · cs.AI· cs.LG

An Empirical Comparison of Neural Architectures for Reinforcement Learning in Partially Observable Environments

classification 💻 cs.NE cs.AIcs.LG
keywords learningneuralarchitecturesrecurrentadvantagebetterenvironmentsfitted
0
0 comments X
read the original abstract

This paper explores the performance of fitted neural Q iteration for reinforcement learning in several partially observable environments, using three recurrent neural network architectures: Long Short-Term Memory, Gated Recurrent Unit and MUT1, a recurrent neural architecture evolved from a pool of several thousands candidate architectures. A variant of fitted Q iteration, based on Advantage values instead of Q values, is also explored. The results show that GRU performs significantly better than LSTM and MUT1 for most of the problems considered, requiring less training episodes and less CPU time before learning a very good policy. Advantage learning also tends to produce better results.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.