Learning to combat compounding-error in model-based reinforcement learning

Chenjun Xiao, Yifan Wu, Chen Ma, Dale Schuurmans, Martin Müller · 1912 · arXiv 1912.11206

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Advantage-Guided Diffusion for Model-Based Reinforcement Learning

cs.AI · 2026-04-10 · unverdicted · novelty 7.0

Advantage-guided diffusion (SAG and EAG) steers sampling in diffusion world models to higher-advantage trajectories, enabling policy improvement and better sample efficiency on MuJoCo tasks.

When to Re-Commit: Temporal Abstraction Discovery for Long-Horizon Vision-Language Reasoning

cs.AI · 2026-05-11 · unverdicted · novelty 6.0 · 3 refs

Learns state-conditioned commitment depth in a 7B vision-language policy that jointly predicts actions and replan intervals, outperforming fixed-depth baselines and larger models on Sliding Puzzle and Sokoban while providing a theoretical dominance result.

Extracting Search Trees from LLM Reasoning Traces Reveals Myopic Planning

cs.AI · 2026-05-07 · 4 refs

Dream-MPC: Gradient-Based Model Predictive Control with Latent Imagination

cs.LG · 2026-05-06

citing papers explorer

Showing 4 of 4 citing papers.

Advantage-Guided Diffusion for Model-Based Reinforcement Learning cs.AI · 2026-04-10 · unverdicted · none · ref 24
Advantage-guided diffusion (SAG and EAG) steers sampling in diffusion world models to higher-advantage trajectories, enabling policy improvement and better sample efficiency on MuJoCo tasks.
When to Re-Commit: Temporal Abstraction Discovery for Long-Horizon Vision-Language Reasoning cs.AI · 2026-05-11 · unverdicted · none · ref 51 · 3 links
Learns state-conditioned commitment depth in a 7B vision-language policy that jointly predicts actions and replan intervals, outperforming fixed-depth baselines and larger models on Sliding Puzzle and Sokoban while providing a theoretical dominance result.
Extracting Search Trees from LLM Reasoning Traces Reveals Myopic Planning cs.AI · 2026-05-07 · unreviewed · ref 34 · 4 links
Dream-MPC: Gradient-Based Model Predictive Control with Latent Imagination cs.LG · 2026-05-06 · unreviewed · ref 9

Learning to combat compounding-error in model-based reinforcement learning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer