Adding a hindsight factor that integrates historic temporal differences into the Q-learning loss reduces overestimation and yields higher average scores than DQN, DDQN and dueling networks on ATARI games after 10 million frames.
Averaged-dqn: Variance reduction and stabiliza- tion for deep reinforcement learning
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2019 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
In Hindsight: A Smooth Reward for Steady Exploration
Adding a hindsight factor that integrates historic temporal differences into the Q-learning loss reduces overestimation and yields higher average scores than DQN, DDQN and dueling networks on ATARI games after 10 million frames.