GLENS uses diffusion models on solver iterates to generate high-quality and diverse initial guesses for multimodal non-convex optimization, leading to faster solver convergence.
hub Canonical reference
Planning with Diffusion for Flexible Behavior Synthesis
Canonical reference. 73% of citing Pith papers cite this work as background.
abstract
Model-based reinforcement learning methods often use learning only for the purpose of estimating an approximate dynamics model, offloading the rest of the decision-making work to classical trajectory optimizers. While conceptually simple, this combination has a number of empirical shortcomings, suggesting that learned models may not be well-suited to standard trajectory optimization. In this paper, we consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem, such that sampling from the model and planning with it become nearly identical. The core of our technical approach lies in a diffusion probabilistic model that plans by iteratively denoising trajectories. We show how classifier-guided sampling and image inpainting can be reinterpreted as coherent planning strategies, explore the unusual and useful properties of diffusion-based planning methods, and demonstrate the effectiveness of our framework in control settings that emphasize long-horizon decision-making and test-time flexibility.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampling than pixel diffusion baselines.
CoDi decomposes the multi-agent diffusion score into pre-trained single-agent policies plus a gradient-free cost guidance term to generate coordinated behavior from single-agent data alone.
Muninn accelerates diffusion trajectory planners up to 4.6x by spending an uncertainty budget to decide when to cache denoiser outputs, preserving performance and certifying bounded deviation from full computation.
SDGD uses cost-conditioned classifier-free guidance plus reward guidance with feasible trajectory relabeling to generate safe high-reward trajectories that adapt to changing safety budgets in offline RL.
Being-H0.7 adds future-aware latent reasoning to direct VLA policies via dual-branch alignment on latent queries, matching world-model benefits at VLA efficiency.
PRISM lets pre-trained text-to-image models handle long prompts by breaking them into compositional parts, predicting noise separately, and merging outputs via energy-based conjunction, matching fine-tuned models while generalizing better to prompts over 500 tokens.
ScoRe-Flow achieves decoupled mean-variance control in stochastic flow matching by deriving a closed-form score for drift modulation plus learned variance, yielding faster RL convergence and higher success rates on locomotion and manipulation benchmarks.
Advantage-guided diffusion (SAG and EAG) steers sampling in diffusion world models to higher-advantage trajectories, enabling policy improvement and better sample efficiency on MuJoCo tasks.
ReV is a referring-aware visuomotor policy using coupled diffusion heads for real-time trajectory replanning in robotic manipulation, trained solely via targeted perturbations to expert demonstrations and achieving higher success rates in simulated and real tasks.
GeCO replaces time-dependent flow matching with time-unconditional optimization, enabling adaptive inference and intrinsic OOD detection for robotic imitation learning.
DARE performs sample-level constraint relaxation in offline-to-online RL by conditioning on behavioral consistency with a behavior model via posterior-induced exchange, yielding improved fine-tuning stability and performance on D4RL benchmarks.
MIMIC-D enables multi-modal multi-agent coordination via joint training of decentralized diffusion policies using only local information.
BeyondMimic combines compact motion tracking with a unified guided latent diffusion model to master diverse agile behaviors from human demos and solve unseen downstream tasks via test-time classifier guidance.
DSRL steers pretrained diffusion policies for robotics by applying RL to their latent noise inputs, achieving sample-efficient real-world adaptation with only black-box access.
BiTrajDiff augments offline RL datasets by running independent forward and backward diffusion processes from intermediate states, yielding higher performance than prior one-directional data-augmentation baselines on D4RL.
RoboDreamer factorizes video generation using language primitives to achieve compositional generalization in robot world models, outperforming monolithic baselines on unseen goals in RT-X.
Diffusion-QL uses conditional diffusion models as expressive policies in offline RL by coupling behavior cloning with Q-value maximization, achieving SOTA on most D4RL tasks.
Derail adversarial perturbations hijack the scoring head in generative E2E driving planners, flipping safe to unsafe trajectory selection with 39-80% score drops and up to 50% collision rates.
RS-Diffuser integrates diffusion planners, quantile regression critics, and CVaR-style guidance to produce risk-averse to risk-seeking behaviors from one model in offline RL.
GDSD reduces RL for dLLMs to likelihood-free self-distillation via a normalization-free logit-matching objective, outperforming ELBO methods with more stable training on LLaDA-8B and Dream-7B.
Introduces 9 synthetic annotation tasks and benchmarks for behavioral cloning, finding hierarchical skill learning, scaling benefits, effective multi-task pretraining, and shared internal representations of task phases and mistakes.
PPM derives a tractable gradient for exact KL optimization in diffusion variational inversion to achieve unbiased posterior matching without heuristic approximations.
CEDGE applies energy-guided trajectory diffusion to generate adapted samples for off-dynamics offline RL, improving planning and policy learning on the ODRL benchmark.
citing papers explorer
-
Multi-Objective Learning for Diffusion Models: A Statistical Theory under Semi-Supervised Learning
A semi-supervised MOL framework for diffusion models with generalization bounds depending only on specialist model complexity, extended to diffusion policies for sequential decisions.
-
Diverse Yet Consistent: Context-Guided Diffusion with Energy-Based Joint Refinement for Multi-Agent Motion Prediction
A context-guided diffusion model with energy-based joint refinement improves both marginal and joint metrics for multi-agent motion prediction on standard benchmarks.
-
Jointly Learning Predicates and Actions Enables Zero-Shot Skill Composition
PACTS jointly model action trajectories and predicate belief trajectories in a single generative policy, enabling zero-shot skill composition via symbolic planning without retraining.
-
SWEET: Sparse World Modeling with Image Editing for Embodied Task Execution
SWEET is a one-shot sparse visual planning framework that progressively generates manipulation keyframes via image editing conditioned on language and spatial guidance, then converts them to actions with a diffusion predictor, showing better fidelity and lower cost than video models on DROID and Rob
-
Behavioral Mode Discovery for Fine-tuning Multimodal Generative Policies
Unsupervised behavioral mode discovery combined with mutual information rewards enables RL fine-tuning of multimodal generative policies that achieves higher success rates without losing action diversity.
-
Insider Attacks in Multi-Agent LLM Consensus Systems
A malicious agent in multi-agent LLM consensus systems can be trained via a surrogate world model and RL to reduce consensus rates and prolong disagreement more effectively than direct prompt attacks.
-
Efficient Hierarchical Implicit Flow Q-learning for Offline Goal-conditioned Reinforcement Learning
Proposes mean flow policies and LeJEPA loss to overcome Gaussian policy limits and weak subgoal generation in hierarchical offline GCRL, reporting strong results on OGBench state and pixel tasks.
-
VGAS: Value-Guided Action-Chunk Selection for Few-Shot Vision-Language-Action Adaptation
VGAS uses best-of-N selection with a geometrically grounded critic and explicit regularization to improve success rates of few-shot VLA policies under limited data and distribution shifts.
-
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds
HY-World 2.0 generates and reconstructs high-fidelity navigable 3D Gaussian Splatting worlds from text, images, or videos via upgraded panorama, planning, expansion, and composition modules, with released code claiming open-source SOTA performance.
-
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
The paper reviews the background, technology, applications, limitations, and future directions of OpenAI's Sora text-to-video generative model based on public information.
- Rectified Schr\"odinger Bridge Matching for Few-Step Visual Navigation
- Frictional Q-Learning