Behavior-regularized actor-critic methods achieve strong offline RL results with simple regularization, rendering many recent technical additions unnecessary.
Striving for simplicity in off-policy deep reinforcement learning
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
fields
cs.LG 3verdicts
UNVERDICTED 3representative citing papers
Many batch RL algorithms underperform both online DQN and the behavioral policy on Atari; an adapted discrete-action BCQ outperforms the others tested.
Offline RL promises to extract high-utility policies from static datasets but faces fundamental challenges that current methods only partially address.
citing papers explorer
-
Behavior Regularized Offline Reinforcement Learning
Behavior-regularized actor-critic methods achieve strong offline RL results with simple regularization, rendering many recent technical additions unnecessary.
-
Benchmarking Batch Deep Reinforcement Learning Algorithms
Many batch RL algorithms underperform both online DQN and the behavioral policy on Atari; an adapted discrete-action BCQ outperforms the others tested.