Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening

Frank S. He , Yang Liu , Alexander G. Schwing , Jian Peng

Authors on Pith no claims yet

classification 💻 cs.LG stat.ML

keywords learningdeepreinforcementtrainingapproachfasternoveloptimality

read the original abstract

We propose a novel training algorithm for reinforcement learning which combines the strength of deep Q-learning with a constrained optimization approach to tighten optimality and encourage faster reward propagation. Our novel technique makes deep reinforcement learning more practical by drastically reducing the training time. We evaluate the performance of our approach on the 49 games of the challenging Arcade Learning Environment, and report significant improvements in both training time and accuracy.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities
cs.AI 2026-05 unverdicted novelty 6.0

LQL stabilizes Q-learning by penalizing violations of n-step action-sequence lower bounds with a hinge loss computed from standard network outputs.
Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities
cs.AI 2026-05 unverdicted novelty 6.0

LQL turns n-step action-sequence lower bounds into a practical hinge-loss stabilizer for off-policy Q-learning without extra networks or forward passes.