The Surprising Difficulty of Search in Model-Based Reinforcement Learning

· 2026 · cs.LG · arXiv 2601.21306

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

This paper investigates search in model-based reinforcement learning (RL). Conventional wisdom holds that long-term predictions and compounding errors are the primary obstacles for model-based RL. We challenge this view, showing that search is not a drop-in replacement for a learned policy. Surprisingly, we find that search can harm performance even when the model is highly accurate. Instead, we show that mitigating overestimation bias matters more than improving model or value function accuracy. Building on this insight, we identify that taking the minimum over an ensemble of value functions effectively addresses this bias and enables effective search, achieving state-of-the-art performance across multiple popular benchmark domains.

representative citing papers

Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization

cs.LG · 2026-05-25 · unverdicted · novelty 5.0

MBDPO reformulates policy optimization as a diffusion process over searched trajectories in latent world models to reduce misalignment between search and value learning.

citing papers explorer

Showing 1 of 1 citing paper.

Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization cs.LG · 2026-05-25 · unverdicted · none · ref 15 · internal anchor
MBDPO reformulates policy optimization as a diffusion process over searched trajectories in latent world models to reduce misalignment between search and value learning.

The Surprising Difficulty of Search in Model-Based Reinforcement Learning

fields

years

verdicts

representative citing papers

citing papers explorer