pith. sign in

arxiv: 1705.03562 · v1 · pith:OGQRKVTXnew · submitted 2017-05-09 · 📊 stat.ML · cs.AI· cs.LG

Deep Episodic Value Iteration for Model-based Meta-Reinforcement Learning

classification 📊 stat.ML cs.AIcs.LG
keywords deepdevimodel-basedepisodiciterationlearningreinforcementstructure
0
0 comments X
read the original abstract

We present a new deep meta reinforcement learner, which we call Deep Episodic Value Iteration (DEVI). DEVI uses a deep neural network to learn a similarity metric for a non-parametric model-based reinforcement learning algorithm. Our model is trained end-to-end via back-propagation. Despite being trained using the model-free Q-learning objective, we show that DEVI's model-based internal structure provides `one-shot' transfer to changes in reward and transition structure, even for tasks with very high-dimensional state spaces.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.