hub Mixed citations

Diffusion Policy Policy Optimization

Allen Z. Ren, Justin Lidard, Lars L. Ankile, Anthony Simeonov, Pulkit Agrawal, Anirudha Majumdar · 2024 · cs.RO · arXiv 2409.00588

Mixed citation behavior. Most common role is background (43%).

30 Pith papers citing it

Background 43% of classified citations

open full Pith review browse 30 citing papers arXiv PDF

abstract

We introduce Diffusion Policy Policy Optimization, DPPO, an algorithmic framework including best practices for fine-tuning diffusion-based policies (e.g. Diffusion Policy) in continuous control and robot learning tasks using the policy gradient (PG) method from reinforcement learning (RL). PG methods are ubiquitous in training RL policies with other policy parameterizations; nevertheless, they had been conjectured to be less efficient for diffusion-based policies. Surprisingly, we show that DPPO achieves the strongest overall performance and efficiency for fine-tuning in common benchmarks compared to other RL methods for diffusion-based policies and also compared to PG fine-tuning of other policy parameterizations. Through experimental investigation, we find that DPPO takes advantage of unique synergies between RL fine-tuning and the diffusion parameterization, leading to structured and on-manifold exploration, stable training, and strong policy robustness. We further demonstrate the strengths of DPPO in a range of realistic settings, including simulated robotic tasks with pixel observations, and via zero-shot deployment of simulation-trained policies on robot hardware in a long-horizon, multi-stage manipulation task. Website with code: diffusion-ppo.github.io

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5 baseline 1 method 1

citation-polarity summary

background 3 unclear 2 baseline 1 use method 1

representative citing papers

BrickCraft: Visuomotor Skill Composition with Situated Manual Guidance for Long-Horizon Interlocking Brick Assembly

cs.RO · 2026-05-08 · unverdicted · novelty 7.0

BrickCraft composes reusable visuomotor skills via relative anchoring to partial structures and situated visual manuals to achieve long-horizon interlocking brick assembly from limited demonstrations with generalization to unseen designs.

ScoRe-Flow: Complete Distributional Control via Score-Based Reinforcement Learning for Flow Matching

cs.RO · 2026-04-13 · unverdicted · novelty 7.0

ScoRe-Flow achieves decoupled mean-variance control in stochastic flow matching by deriving a closed-form score for drift modulation plus learned variance, yielding faster RL convergence and higher success rates on locomotion and manipulation benchmarks.

You've Got a Golden Ticket: Improving Generative Robot Policies With A Single Noise Vector

cs.RO · 2026-03-16 · conditional · novelty 7.0

Optimizing a single constant initial noise vector for frozen generative robot policies improves success rates on 38 of 43 tasks by up to 58% relative improvement.

AID: Agent Intent from Diffusion for Multi-Agent Informative Path Planning

cs.RO · 2025-12-02 · conditional · novelty 7.0

AID trains diffusion policies via behavior cloning on existing MAIPP planners followed by RL fine-tuning to achieve faster execution and higher information gain in multi-agent coordination.

EXPO: Stable Reinforcement Learning with Expressive Policies

cs.LG · 2025-07-10 · conditional · novelty 7.0

EXPO stabilizes online RL for expressive policies by training a base policy with imitation and using a lightweight Gaussian edit policy to select higher-value actions on the fly for sampling and TD backups.

Steering Your Diffusion Policy with Latent Space Reinforcement Learning

cs.RO · 2025-06-18 · unverdicted · novelty 7.0

DSRL steers pretrained diffusion policies for robotics by applying RL to their latent noise inputs, achieving sample-efficient real-world adaptation with only black-box access.

ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving

cs.CV · 2025-06-09 · unverdicted · novelty 7.0

ReCogDrive unifies VLM scene understanding with a diffusion planner reinforced by DiffGRPO to reach state-of-the-art results on NAVSIM and Bench2Drive benchmarks.

Score-Based One-step MeanFlow Policy Optimization

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

SOM is an actor-critic algorithm that constructs the target velocity field for one-step MeanFlow policies directly from the Q-function via score estimation and probability flow ODE, achieving claimed SOTA on locomotion tasks with reduced training and inference time.

Global Convergence of Sampling-Based Nonconvex Optimization through Diffusion-Style Smoothing

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.

Long-Horizon Q-Learning: Accurate Value Learning via n-Step Inequalities

cs.AI · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

LQL turns n-step action-sequence lower bounds into a practical hinge-loss stabilizer for off-policy Q-learning without extra networks or forward passes.

ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving

cs.RO · 2026-05-06 · unverdicted · novelty 6.0 · 2 refs

ReflectDrive-2 combines masked discrete diffusion with RL-aligned self-editing to generate and refine driving trajectories, reaching 91.0 PDMS on NAVSIM camera-only and 94.8 in best-of-6.

One Step Forward and K Steps Back: Better Reasoning with Denoising Recursion Models

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

Denoising Recursion Models train multi-step noise reversal in looped transformers and outperform the prior Tiny Recursion Model on ARC-AGI.

NaP-Control: Navigating Diffusion Prior for Versatile and Fast Character Control

cs.GR · 2026-04-15 · unverdicted · novelty 6.0

NaP-Control uses RL to directly predict optimized diffusion noise from a task-agnostic prior, enabling fast inference and higher success rates for versatile whole-body character control while preserving motion quality.

ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors

cs.RO · 2026-03-16 · conditional · novelty 6.0

ExpertGen generates high-success expert policies in simulation from imperfect priors by freezing a diffusion behavior model and optimizing its initial noise via RL, then distills them for real-robot deployment.

What Does Flow Matching Bring To TD Learning?

cs.LG · 2026-03-04 · conditional · novelty 6.0

Flow matching critics outperform monolithic ones in RL by 2x performance and 5x sample efficiency via test-time error recovery through integration and multi-point velocity supervision that preserves feature plasticity.

Space Syntax-guided Post-training for Residential Floor Plan Generation

cs.LG · 2026-02-26 · unverdicted · novelty 6.0

SSPT turns space-syntax integration metrics into post-training feedback signals that improve public-space dominance and functional hierarchy in AI-generated residential floor plans.

RL-RIG: A Generative Spatial Reasoner via Intrinsic Reflection

cs.CV · 2026-02-23 · unverdicted · novelty 6.0

RL-RIG uses a generate-reflect-edit loop with reinforcement learning to improve spatial accuracy in image generation, reporting up to 11% gains over prior open-source models on scene-graph metrics.

How Does the Lagrangian Guide Safe Reinforcement Learning through Diffusion Models?

cs.LG · 2026-02-02 · unverdicted · novelty 6.0

ALGD augments the Lagrangian to locally convexify the energy landscape in diffusion models, stabilizing safe RL training and generation without changing optimal policies.

Reinforcement Learning with Discrete Diffusion Policies for Combinatorial Action Spaces

cs.LG · 2025-09-26 · unverdicted · novelty 6.0

A method trains discrete diffusion policies for combinatorial RL by matching to a PMD-regularized target distribution, reporting SOTA performance and sample efficiency on DNA generation, macro-action, and multi-agent benchmarks.

AnyPos: Automated Task-Agnostic Actions for Bimanual Manipulation

cs.CV · 2025-07-17 · unverdicted · novelty 6.0

AnyPos automates task-agnostic action collection and inverse-dynamics modeling with arm/end-effector decoupling plus a direction-aware decoder, delivering 51% higher test accuracy and 30-40% better success rates on bimanual tasks.

Reinforcement Learning with Action Chunking

cs.LG · 2025-07-10 · unverdicted · novelty 6.0

Q-chunking improves offline-to-online RL sample efficiency on long-horizon sparse-reward manipulation tasks by applying action chunking to TD learning.

DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control

cs.RO · 2025-02-09 · unverdicted · novelty 6.0

DexVLA combines a scaled diffusion action expert with embodiment curriculum learning to achieve better generalization and performance than prior VLA models on diverse robot hardware and long-horizon tasks.

DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization

cs.CV · 2024-12-20 · unverdicted · novelty 6.0

DOLLAR combines variational score and consistency distillation for few-step video generation plus latent reward optimization, reporting 82.57 VBench score and up to 278x speedup over the teacher diffusion model for 128-frame 10-second videos.

EponaV2: Driving World Model with Comprehensive Future Reasoning

cs.CV · 2026-05-14 · unverdicted · novelty 5.0

EponaV2 advances perception-free driving world models by forecasting comprehensive future 3D geometry and semantic representations, achieving SOTA planning performance on NAVSIM benchmarks.

citing papers explorer

Showing 2 of 2 citing papers after filters.

SCRIPT: Scalable Diffusion Policy with Multi-stage Training for Language-driven Physics-based Humanoid Control cs.GR · 2026-05-21 · unreviewed · ref 32 · internal anchor
OGPO: Sample Efficient Full-Finetuning of Generative Control Policies cs.LG · 2026-05-04 · unreviewed · ref 171 · internal anchor

Diffusion Policy Policy Optimization

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer