Trains a gating policy to select state-dependent planning budgets in variable-delay real-time RL, outperforming fixed-budget and heuristic baselines across Pac-Man, Tetris, Snake, Speed Hex, and Speed Go.
Handling delay in real-time reinforcement learning.arXiv preprint arXiv:2503.23478,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Finding the Time to Think: Learning Planning Budgets in Real-Time RL
Trains a gating policy to select state-dependent planning budgets in variable-delay real-time RL, outperforming fixed-budget and heuristic baselines across Pac-Man, Tetris, Snake, Speed Hex, and Speed Go.