A travel-cost value function defined via a proposed running cost is the unique bounded viscosity solution to a time-dependent HJB PDE whose negative sublevel set is the strict backward-reachable tube, and small-step RL value iteration converges to the forward discounted HJB solution.
Human-level control through deep reinforcement learning.nature, 518(7540):529–533, 2015
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3roles
background 1polarities
background 1representative citing papers
FlashSAC improves training speed and final performance of off-policy RL on high-dimensional robot tasks by reducing update frequency, increasing model scale, and bounding norms to limit critic error accumulation.
A C++ Dec-POMDP simulator using data-oriented design and zero-copy PyTorch integration achieves up to 33 million steps per second on a 16-core CPU, enabling multi-agent policy training in minutes with PPO, DQN, and SAC.
citing papers explorer
-
Unifying Hamilton-Jacobi Reachability and Reinforcement Learning
A travel-cost value function defined via a proposed running cost is the unique bounded viscosity solution to a time-dependent HJB PDE whose negative sublevel set is the strict backward-reachable tube, and small-step RL value iteration converges to the forward discounted HJB solution.
-
FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control
FlashSAC improves training speed and final performance of off-policy RL on high-dimensional robot tasks by reducing update frequency, increasing model scale, and bounding norms to limit critic error accumulation.
-
A High-Throughput Compute-Efficient POMDP Hide-And-Seek-Engine (HASE) for Multi-Agent Operations
A C++ Dec-POMDP simulator using data-oriented design and zero-copy PyTorch integration achieves up to 33 million steps per second on a 16-core CPU, enabling multi-agent policy training in minutes with PPO, DQN, and SAC.