Combating the Compounding-Error Problem with a Multi-step Model

URL http://arxiv · 2019 · cs.LG · arXiv 1905.13320

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open full Pith review browse 5 citing papers arXiv PDF

abstract

Model-based reinforcement learning is an appealing framework for creating agents that learn, plan, and act in sequential environments. Model-based algorithms typically involve learning a transition model that takes a state and an action and outputs the next state---a one-step model. This model can be composed with itself to enable predicting multiple steps into the future, but one-step prediction errors can get magnified, leading to unacceptable inaccuracy. This compounding-error problem plagues planning and undermines model-based reinforcement learning. In this paper, we address the compounding-error problem by introducing a multi-step model that directly outputs the outcome of executing a sequence of actions. Novel theoretical and empirical results indicate that the multi-step model is more conducive to efficient value-function estimation, and it yields better action selection compared to the one-step model. These results make a strong case for using multi-step models in the context of model-based reinforcement learning.

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Long-Horizon Model-Based Offline Reinforcement Learning Without Explicit Conservatism

cs.LG · 2025-12-04 · conditional · novelty 7.0

NEUBAY uses Bayesian posteriors over world models with long-horizon planning to match or exceed conservative offline RL methods without explicit conservatism.

Advantage-Guided Diffusion for Model-Based Reinforcement Learning

cs.AI · 2026-04-10 · unverdicted · novelty 7.0

Advantage-guided diffusion (SAG and EAG) steers sampling in diffusion world models to higher-advantage trajectories, enabling policy improvement and better sample efficiency on MuJoCo tasks.

Dream-MPC: Gradient-Based Model Predictive Control with Latent Imagination

cs.LG · 2026-05-06 · unverdicted · novelty 6.0 · 2 refs

Dream-MPC refines policy-generated trajectories by gradient ascent in a latent world model with uncertainty regularization and temporal amortization, improving base policy performance and beating gradient-free MPC on 24 continuous control tasks.

Is Conditional Generative Modeling all you need for Decision-Making?

cs.LG · 2022-11-28 · unverdicted · novelty 6.0

Return-conditional diffusion models for policies outperform offline RL on benchmarks by circumventing dynamic programming and enable constraint or skill composition.

Rethinking Code Review in the Age of AI: A Vision for Agentic Code Review

cs.SE · 2026-05-17 · unverdicted · novelty 5.0

The paper presents a vision for an agentic code review framework spanning PR Creation, Augmentation, Reviewer Selection, AI-Assisted Review, and Retrospective, with humans retained at quality gates.

citing papers explorer

Showing 5 of 5 citing papers.

Long-Horizon Model-Based Offline Reinforcement Learning Without Explicit Conservatism cs.LG · 2025-12-04 · conditional · none · ref 1 · internal anchor
NEUBAY uses Bayesian posteriors over world models with long-horizon planning to match or exceed conservative offline RL methods without explicit conservatism.
Advantage-Guided Diffusion for Model-Based Reinforcement Learning cs.AI · 2026-04-10 · unverdicted · none · ref 25
Advantage-guided diffusion (SAG and EAG) steers sampling in diffusion world models to higher-advantage trajectories, enabling policy improvement and better sample efficiency on MuJoCo tasks.
Dream-MPC: Gradient-Based Model Predictive Control with Latent Imagination cs.LG · 2026-05-06 · unverdicted · none · ref 1 · 2 links · internal anchor
Dream-MPC refines policy-generated trajectories by gradient ascent in a latent world model with uncertainty regularization and temporal amortization, improving base policy performance and beating gradient-free MPC on 24 continuous control tasks.
Is Conditional Generative Modeling all you need for Decision-Making? cs.LG · 2022-11-28 · unverdicted · none · ref 126 · internal anchor
Return-conditional diffusion models for policies outperform offline RL on benchmarks by circumventing dynamic programming and enable constraint or skill composition.
Rethinking Code Review in the Age of AI: A Vision for Agentic Code Review cs.SE · 2026-05-17 · unverdicted · none · ref 224 · internal anchor
The paper presents a vision for an agentic code review framework spanning PR Creation, Augmentation, Reviewer Selection, AI-Assisted Review, and Retrospective, with humans retained at quality gates.

Combating the Compounding-Error Problem with a Multi-step Model

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer