Diffusion models for reinforcement learning: A survey

Diffusion models for reinforcement learning: A survey , author= · 2023 · arXiv 2311.01223

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Aligning Flow Map Policies with Optimal Q-Guidance

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Flow map policies enable fast one-step inference for flow-based RL policies, and FMQ provides an optimal closed-form Q-guided target for offline-to-online adaptation under trust-region constraints, achieving SOTA performance.

Muninn: Your Trajectory Diffusion Model But Faster

cs.RO · 2026-05-11 · unverdicted · novelty 7.0

Muninn accelerates diffusion trajectory planners up to 4.6x by spending an uncertainty budget to decide when to cache denoiser outputs, preserving performance and certifying bounded deviation from full computation.

TacticGen: Grounding Adaptable and Scalable Generation of Football Tactics

cs.AI · 2026-04-20 · conditional · novelty 7.0

TacticGen generates realistic, adaptable football tactics via a multi-agent diffusion transformer trained on 3.3M events and 100M frames, supporting rule-, language-, or model-based guidance at inference time.

Improved Sample Complexity For Diffusion Model Training Without Empirical Risk Minimizer Access

cs.LG · 2025-05-23 · conditional · novelty 7.0

The paper establishes an O(ε^{-4}) sample complexity bound for score estimation in diffusion models without requiring access to the empirical risk minimizer.

Global Convergence of Sampling-Based Nonconvex Optimization through Diffusion-Style Smoothing

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.

Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

Ada-Diffuser is a causal diffusion model that jointly learns observed interaction structure and underlying latent dynamics from minimal observations for adaptive planning and policy learning.

Diffusion Policy Policy Optimization

cs.RO · 2024-09-01 · unverdicted · novelty 6.0

DPPO fine-tunes diffusion policies via policy gradients and outperforms prior RL approaches for diffusion policies and PG-tuned alternatives on robot benchmarks while enabling stable training and hardware deployment.

Rectified Schr\"odinger Bridge Matching for Few-Step Visual Navigation

cs.RO · 2026-04-07

citing papers explorer

Showing 8 of 8 citing papers.

Aligning Flow Map Policies with Optimal Q-Guidance cs.LG · 2026-05-12 · unverdicted · none · ref 52
Flow map policies enable fast one-step inference for flow-based RL policies, and FMQ provides an optimal closed-form Q-guided target for offline-to-online adaptation under trust-region constraints, achieving SOTA performance.
Muninn: Your Trajectory Diffusion Model But Faster cs.RO · 2026-05-11 · unverdicted · none · ref 74
Muninn accelerates diffusion trajectory planners up to 4.6x by spending an uncertainty budget to decide when to cache denoiser outputs, preserving performance and certifying bounded deviation from full computation.
TacticGen: Grounding Adaptable and Scalable Generation of Football Tactics cs.AI · 2026-04-20 · conditional · none · ref 47
TacticGen generates realistic, adaptable football tactics via a multi-agent diffusion transformer trained on 3.3M events and 100M frames, supporting rule-, language-, or model-based guidance at inference time.
Improved Sample Complexity For Diffusion Model Training Without Empirical Risk Minimizer Access cs.LG · 2025-05-23 · conditional · none · ref 20
The paper establishes an O(ε^{-4}) sample complexity bound for score estimation in diffusion models without requiring access to the empirical risk minimizer.
Global Convergence of Sampling-Based Nonconvex Optimization through Diffusion-Style Smoothing cs.LG · 2026-05-15 · unverdicted · none · ref 239
Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.
Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making cs.LG · 2026-05-15 · unverdicted · none · ref 84
Ada-Diffuser is a causal diffusion model that jointly learns observed interaction structure and underlying latent dynamics from minimal observations for adaptive planning and policy learning.
Diffusion Policy Policy Optimization cs.RO · 2024-09-01 · unverdicted · none · ref 108
DPPO fine-tunes diffusion policies via policy gradients and outperforms prior RL approaches for diffusion policies and PG-tuned alternatives on robot benchmarks while enabling stable training and hardware deployment.
Rectified Schr\"odinger Bridge Matching for Few-Step Visual Navigation cs.RO · 2026-04-07 · unreviewed · ref 16

Diffusion models for reinforcement learning: A survey

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer