Calibrated Model-Based Deep Reinforcement Learning

Ali Malik , Volodymyr Kuleshov , Jiaming Song , Danny Nemer , Harlan Seymour , Stefano Ermon

Authors on Pith no claims yet

classification 💻 cs.LG stat.ML

keywords learningmodel-basedreinforcementcalibratedperformancedeepplanningpredictive

read the original abstract

Estimates of predictive uncertainty are important for accurate model-based planning and reinforcement learning. However, predictive uncertainties---especially ones derived from modern deep learning systems---can be inaccurate and impose a bottleneck on performance. This paper explores which uncertainties are needed for model-based reinforcement learning and argues that good uncertainties must be calibrated, i.e. their probabilities should match empirical frequencies of predicted events. We describe a simple way to augment any model-based reinforcement learning agent with a calibrated model and show that doing so consistently improves planning, sample complexity, and exploration. On the \textsc{HalfCheetah} MuJoCo task, our system achieves state-of-the-art performance using 50\% fewer samples than the current leading approach. Our findings suggest that calibration can improve the performance of model-based reinforcement learning with minimal computational and implementation overhead.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Optimistic Policy Learning under Pessimistic Adversaries with Regret and Violation Guarantees
cs.LG 2026-04 unverdicted novelty 8.0

RHC-UCRL is the first algorithm for safety-constrained RL under explicit adversarial dynamics, providing sub-linear regret and constraint violation guarantees by maintaining optimism over both agent and adversary policies.