pith. sign in

Model-Based Planning with Discrete and Continuous Actions

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it
abstract

Action planning using learned and differentiable forward models of the world is a general approach which has a number of desirable properties, including improved sample complexity over model-free RL methods, reuse of learned models across different tasks, and the ability to perform efficient gradient-based optimization in continuous action spaces. However, this approach does not apply straightforwardly when the action space is discrete. In this work, we show that it is in fact possible to effectively perform planning via backprop in discrete action spaces, using a simple paramaterization of the actions vectors on the simplex combined with input noise when training the forward model. Our experiments show that this approach can match or outperform model-free RL and discrete planning methods on gridworld navigation tasks in terms of performance and/or planning time while using limited environment interactions, and can additionally be used to perform model-based control in a challenging new task where the action space combines discrete and continuous actions. We furthermore propose a policy distillation approach which yields a fast policy network which can be used at inference time, removing the need for an iterative planning procedure.

fields

cs.LG 4 cs.CL 1

representative citing papers

Mastering Atari with Discrete World Models

cs.LG · 2020-10-05 · accept · novelty 7.0

DreamerV2 reaches human-level performance on 55 Atari games by learning behaviors inside a separately trained discrete-latent world model.

Dream to Control: Learning Behaviors by Latent Imagination

cs.LG · 2019-12-03 · accept · novelty 7.0

Dreamer learns to control from images by imagining and optimizing behaviors in a learned latent world model, outperforming prior methods on 20 visual tasks in data efficiency and final performance.

Dream-MPC: Gradient-Based Model Predictive Control with Latent Imagination

cs.LG · 2026-05-06 · unverdicted · novelty 6.0 · 2 refs

Dream-MPC refines policy-generated trajectories by gradient ascent in a latent world model with uncertainty regularization and temporal amortization, improving base policy performance and beating gradient-free MPC on 24 continuous control tasks.

Reasoning with Language Model is Planning with World Model

cs.CL · 2023-05-24 · unverdicted · novelty 6.0

RAP turns LLMs into dual world-model and planning agents via MCTS to generate better reasoning paths, outperforming CoT baselines and achieving 33% relative gains over GPT-4 CoT using LLaMA-33B on plan generation.

citing papers explorer

Showing 5 of 5 citing papers.

  • Mastering Atari with Discrete World Models cs.LG · 2020-10-05 · accept · none · ref 25 · internal anchor

    DreamerV2 reaches human-level performance on 55 Atari games by learning behaviors inside a separately trained discrete-latent world model.

  • Dream to Control: Learning Behaviors by Latent Imagination cs.LG · 2019-12-03 · accept · none · ref 22

    Dreamer learns to control from images by imagining and optimizing behaviors in a learned latent world model, outperforming prior methods on 20 visual tasks in data efficiency and final performance.

  • Dream-MPC: Gradient-Based Model Predictive Control with Latent Imagination cs.LG · 2026-05-06 · unverdicted · none · ref 3 · 2 links · internal anchor

    Dream-MPC refines policy-generated trajectories by gradient ascent in a latent world model with uncertainty regularization and temporal amortization, improving base policy performance and beating gradient-free MPC on 24 continuous control tasks.

  • Reasoning with Language Model is Planning with World Model cs.CL · 2023-05-24 · unverdicted · none · ref 82 · internal anchor

    RAP turns LLMs into dual world-model and planning agents via MCTS to generate better reasoning paths, outperforming CoT baselines and achieving 33% relative gains over GPT-4 CoT using LLaMA-33B on plan generation.

  • stable-worldmodel: A Platform for Reproducible World Modeling Research and Evaluation cs.LG · 2026-05-20 · unverdicted · none · ref 32 · internal anchor

    The paper presents stable-worldmodel (swm), a platform with high-performance data layer, modern world model baselines, planning solvers, and extended environments for reproducible research and generalization evaluation.