An Empirical Comparison of Neural Architectures for Reinforcement Learning in Partially Observable Environments

Denis Steckelmacher; Peter Vrancx

arxiv: 1512.05509 · v1 · pith:ISE3X7VDnew · submitted 2015-12-17 · 💻 cs.NE · cs.AI· cs.LG

An Empirical Comparison of Neural Architectures for Reinforcement Learning in Partially Observable Environments

Denis Steckelmacher , Peter Vrancx This is my paper

classification 💻 cs.NE cs.AIcs.LG

keywords learningneuralarchitecturesrecurrentadvantagebetterenvironmentsfitted

0 comments

read the original abstract

This paper explores the performance of fitted neural Q iteration for reinforcement learning in several partially observable environments, using three recurrent neural network architectures: Long Short-Term Memory, Gated Recurrent Unit and MUT1, a recurrent neural architecture evolved from a pool of several thousands candidate architectures. A variant of fitted Q iteration, based on Advantage values instead of Q values, is also explored. The results show that GRU performs significantly better than LSTM and MUT1 for most of the problems considered, requiring less training episodes and less CPU time before learning a very good policy. Advantage learning also tends to produce better results.

This paper has not been read by Pith yet.

An Empirical Comparison of Neural Architectures for Reinforcement Learning in Partially Observable Environments

discussion (0)