QED bounds cross-run KL divergence in Boltzmann policies by setting temperature proportional to Q-disagreement and reduces return variance by two orders of magnitude on 18 continuous-control tasks without performance loss.
Issues in Using Function Approximation for Reinforcement Learning
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 2
citation-polarity summary
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
GPLD applies a row-wise Jacobian penalty to DreamerV3's posterior latent distribution, producing higher sample efficiency on DeepMind Control proprioceptive tasks.
citing papers explorer
-
Behavior-Consistent Deep Reinforcement Learning
QED bounds cross-run KL divergence in Boltzmann policies by setting temperature proportional to Q-disagreement and reduces return variance by two orders of magnitude on 18 continuous-control tasks without performance loss.
-
Dreaming Smoothly and Sample Efficiently with Gradient Penalized Latent Dynamics
GPLD applies a row-wise Jacobian penalty to DreamerV3's posterior latent distribution, producing higher sample efficiency on DeepMind Control proprioceptive tasks.