Recurrent Reinforcement Learning: A Hybrid Approach

Jianfeng Gao; Jianshu Chen; Ji He; Li Deng; Lihong Li; Xiaodong He; Xiujun Li

arxiv: 1509.03044 · v2 · pith:AT3RYEHQnew · submitted 2015-09-10 · 💻 cs.LG · cs.AI· cs.SY

Recurrent Reinforcement Learning: A Hybrid Approach

Xiujun Li , Lihong Li , Jianfeng Gao , Xiaodong He , Jianshu Chen , Li Deng , Ji He This is my paper

classification 💻 cs.LG cs.AIcs.SY

keywords learningstatesapproachreinforcementcomponentdomainhiddenhistory

0 comments

read the original abstract

Successful applications of reinforcement learning in real-world problems often require dealing with partially observable states. It is in general very challenging to construct and infer hidden states as they often depend on the agent's entire interaction history and may require substantial domain knowledge. In this work, we investigate a deep-learning approach to learning the representation of states in partially observable tasks, with minimal prior knowledge of the domain. In particular, we propose a new family of hybrid models that combines the strength of both supervised learning (SL) and reinforcement learning (RL), trained in a joint fashion: The SL component can be a recurrent neural networks (RNN) or its long short-term memory (LSTM) version, which is equipped with the desired property of being able to capture long-term dependency on history, thus providing an effective way of learning the representation of hidden states. The RL component is a deep Q-network (DQN) that learns to optimize the control for maximizing long-term rewards. Extensive experiments in a direct mailing campaign problem demonstrate the effectiveness and advantages of the proposed approach, which performs the best among a set of previous state-of-the-art methods.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Shaping Belief States with Generative Environment Models for RL
cs.LG 2019-06 unverdicted novelty 5.0

Multi-step predictive generative models form stable belief states capturing environment layout and agent pose, yielding higher data efficiency on RL tasks than model-free agents.