QED bounds cross-run KL divergence in Boltzmann policies by setting temperature proportional to Q-disagreement and reduces return variance by two orders of magnitude on 18 continuous-control tasks without performance loss.
Nature , volume=
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
DQPOPE estimates the entire return distribution in off-policy evaluation via deep quantile process regression, providing statistical advantages over standard single-value methods with equivalent sample sizes.
Using common random numbers in rollout simulations provably reduces variance in relative utility estimates when a rollout policy is invoked beyond some depth.
RADS applies reinforcement learning to pick informative samples for transfer learning, improving performance over uncertainty and diversity sampling in low-resource imbalanced clinical settings.
citing papers explorer
-
Behavior-Consistent Deep Reinforcement Learning
QED bounds cross-run KL divergence in Boltzmann policies by setting temperature proportional to Q-disagreement and reduces return variance by two orders of magnitude on 18 continuous-control tasks without performance loss.
-
Distributional Off-Policy Evaluation with Deep Quantile Process Regression
DQPOPE estimates the entire return distribution in off-policy evaluation via deep quantile process regression, providing statistical advantages over standard single-value methods with equivalent sample sizes.
-
Using Common Random Numbers for Simulation-based Planning with Rollouts
Using common random numbers in rollout simulations provably reduces variance in relative utility estimates when a rollout policy is invoked beyond some depth.
-
RADS: Reinforcement Learning-Based Sample Selection Improves Transfer Learning in Low-resource and Imbalanced Clinical Settings
RADS applies reinforcement learning to pick informative samples for transfer learning, improving performance over uncertainty and diversity sampling in low-resource imbalanced clinical settings.