pith. machine review for the scientific record. sign in

arxiv: 2605.09999 · v1 · submitted 2026-05-11 · 💻 cs.RO · cs.PF· cs.SY· eess.SY

Recognition: 2 theorem links

· Lean Theorem

Muninn: Your Trajectory Diffusion Model But Faster

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:44 UTC · model grok-4.3

classification 💻 cs.RO cs.PFcs.SYeess.SY
keywords diffusion modelstrajectory planningrobot motion planningdenoising accelerationcachingreal-time controlsampling speeduperror bounds
0
0 comments X

The pith

Muninn speeds up diffusion trajectory planners up to 4.6 times by caching denoiser steps whose reuse is provably safe.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Diffusion-based planners generate rich robot motions through repeated denoising but run too slowly for real-time control. Muninn adds a lightweight wrapper that reuses a cached denoiser output at any step when a cheap probe shows the trajectory has changed little enough that the final plan will still stay inside a chosen deviation bound. The bound is computed from an offline-calibrated combination of observed trajectory shifts and the known way denoiser errors affect the sampler update. When the bound is tight the system recomputes; otherwise it reuses the cache. The result is fewer network calls while task success and safety margins remain intact.

Core claim

By tracking a running uncertainty budget built from a trajectory-change probe and analytic coefficients that propagate denoiser error through the sampler, Muninn decides at each diffusion step whether to reuse a cached network output or recompute it, delivering up to 4.6 times fewer evaluations while guaranteeing the final trajectory lies within a user-specified distance of the full-compute result.

What carries the argument

The per-step uncertainty score that upper-bounds final-trajectory deviation when a cached denoiser output is reused, obtained by calibrating an online trajectory-change probe against offline analytic error-propagation coefficients.

If this is right

  • Wall-clock speedups of up to 4.6 times appear across several trajectory diffusion models on standard benchmarks.
  • Task performance and safety metrics stay statistically unchanged.
  • Cached trajectories are certified to lie inside a user-chosen distance of their full-compute versions.
  • The wrapper works on any state-space diffusion architecture without retraining.
  • The same speedups and certificates hold in real-time closed-loop navigation and manipulation experiments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same probe-and-bound idea could be applied to other iterative generative planners that expose cheap change signals.
  • Tighter offline calibration on more diverse trajectories might shrink the uncertainty budget and yield still larger speedups.
  • Because the bound is independent of the particular robot dynamics, the method might transfer to non-robotic diffusion sampling tasks where error certificates matter.
  • Integration with downstream controllers that already consume trajectory uncertainty could turn Muninn's budget into an explicit safety margin.

Load-bearing premise

The per-step score supplies a valid upper bound on how far the finished trajectory can drift when a cached denoiser output is reused during closed-loop robot operation.

What would settle it

Any benchmark or hardware trial in which a Muninn trajectory deviates from its full-compute counterpart by more than the declared bound.

Figures

Figures reproduced from arXiv: 2605.09999 by Gokul Puthumanaillam, Hao Jiang, Jose Fuentes, Leonardo Bobadilla, Melkior Ornik, Paulo Padrao, Ruben Hernandez.

Figure 1
Figure 1. Figure 1: High-level overview of Muninn applied to Diffuser. Diffuser recomputes the denoiser at every diffusion step, while Muninn wraps the same model and selectively reuses cached denoiser outputs at some steps, reducing compute while leaving the overall trajectory generation unchanged. the model’s behavior changes in opaque ways, and there is no clear link between “saved compute” and “lost plan quality.” This pa… view at source ↗
Figure 2
Figure 2. Figure 2: Challenges in caching trajectory diffusion models. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Paired reverse-diffusion step. 𝑒𝑡 is the denoiser error from reusing 𝜀˜𝑡 and Δ𝑡 is the induced trajectory mismatch; a Lipschitz bound splits Δ𝑡−1 into propagated mismatch and injected error. Unrolling the recursion: Both processes are initialized from the same noisy trajectory, so Δ𝑇 = 0. Repeatedly applying the local bound (1) from 𝑡 = 𝑇 down to 𝑡 = 1 yields a closed-form bound on Δ0. For 𝑡 = 𝑇 we have ∥Δ… view at source ↗
Figure 4
Figure 4. Figure 4: Probe features and per-step error scores. (Left) [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: (Stages 1–3) From offline rollouts, we build a calibration set of score–error pairs by running the full planner with a ghost reuse chain. (Stage 4) A regressor 𝑚(𝑠) models the typical score–error relationship, and split conformal prediction yields 𝑈(𝑠). At test time, Muninn maps each score to 𝑈(𝑠𝑡 ), converts it to a trajectory-level cost, and spends it from the budget. timesteps where reuse is permitted. … view at source ↗
Figure 7
Figure 7. Figure 7: Angle of attacks (hard peg insertion) Red: Diffusion Policy; blue: Muninn. Pl. Method Lat. (ms) ↓ Succ. (%) ↑ Coll. (%) ↓ BCOD1s [48] 12 94.0 2.4 BCODtch. [48] 60 97.0 1.6 BCODtch. + 24 96.8 1.7 GC-Diff. [21] 70 95.5 2.2 ASV GC-Diff.+ 28 95.3 2.3 GC-Diff. [21] 75 93.0 3.0 UAV GC-Diff.+ 30 92.8 3.1 Diff. policy [9] 40 82.0 6.0 SO-100 DP + 24 81.5 6.2 TABLE IV: We compare the Full (teacher) models against th… view at source ↗
Figure 8
Figure 8. Figure 8: We plot the D4RL score vs. wall-clock latency for D4RL AntMaze. Muninn achieves the same task success rate as Full model 2× quicker than the baselines. Method Task ↑ E[𝑑]↓ 𝑝ˆviol↓ Full 68.7 0.00 0.00 FewSteps 56.1 0.12 0.17 FixedSkip 42.5 0.10 0.14 ProbeThresh 48.0 0.09 0.10 + Full 67.6 0.07 0.04 TABLE V: We compare the Full (un￾modified 𝑇-step sampler) model against the baselines and Muninn on the D4RL An… view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative results. Full-compute trajectories (GC-Diffuser and B-COD teacher) are shown in red; Muninn-accelerated trajectories are shown in blue [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 9
Figure 9. Figure 9: Scatter plot summarizing the practical acceleration spectrum on the DP3 Pour diffusion policy. We also report the average success percentage of these models during inference. GPU-hours are normalized to A10 24GB-hours equiv￾alent. More details about the training procedure is available in the Appendix. We find that Muninn provides a deployment-friendly point on the spectrum: it recovers a large fraction of … view at source ↗
Figure 10
Figure 10. Figure 10: Left: Reliability plot: predicted 𝐷ˆ vs. realized deviation 𝑑; Muninn is monotone and conservative (below the diagonal). Right: GC-Diffuser+Muninn on 3D UAV navigation: 𝐷ˆ over control cycles, with spikes aligning to near￾collision/contact events (red); callouts show the UAV pose at those instants. Ablations. We ablate Munin’s key design choices and report a detailed summary in Appendix D. Overall, (i) in… view at source ↗
Figure 11
Figure 11. Figure 11: D4RL locomotion environments State/action spaces and trajectory content All three tasks use state vectors composed of robot joint configuration and velocity features, and continuous action vectors corresponding to actuator controls: • HalfCheetah: 𝑑𝑠 = 17, 𝑑𝑎 = 6, so 𝜏 ∈ R 𝐻×23 contains 𝐻 interleaved state–action pairs. • Hopper: 𝑑𝑠 = 11, 𝑑𝑎 = 3, so 𝜏 ∈ R 𝐻×14 . • Walker2d: 𝑑𝑠 = 17, 𝑑𝑎 = 6, so 𝜏 ∈ R 𝐻×23 … view at source ↗
Figure 13
Figure 13. Figure 13: D4RL long horizon environment Preprocessing Observations are standardized using dataset mean/std. Actions are clipped to the environment action bounds and scaled to the action range expected by the baseline implementation. AntMaze (D4RL) We evaluate on the D4RL AntMaze large-play benchmark antmaze-large-play-v0, a quadruped (ant) navigation task in a maze. The agent must reach a 2D goal location in the ma… view at source ↗
Figure 12
Figure 12. Figure 12: D4RL goal reaching environments Maze2D (D4RL) We evaluate on the D4RL Maze2D large benchmark maze2d-large-v1, a force-actuated point-mass navigation problem in a 2D maze. The agent must reach a target (𝑥, 𝑦) location while navigating around walls. State/action spaces and trajectory content • State: 𝑠𝑡 = (𝑥𝑡 , 𝑦𝑡 , 𝑥¤𝑡 , 𝑦¤𝑡) ∈ R 4 so 𝑑𝑠 = 4. • Action: 𝑎𝑡 = ( 𝑓 𝑥 𝑡 , 𝑓 𝑦 𝑡 ) ∈ R 2 so 𝑑𝑎 = 2. • Trajectory: … view at source ↗
Figure 15
Figure 15. Figure 15: Clutter planning environment Robot model We plan for a 7-DoF Franka Emika Panda-class manipulator in joint space with configuration 𝑞 ∈ R 7 . We use the standard Panda joint limits (radians): 𝑞min = [−2.9, −1.8, −2.9, −3.0, −2.9, 0.0, −2.9], 𝑞max = [ 2.9, 1.8, 2.9, 0.0, 2.9, 3.8, 2.9]. Collision checking uses the robot’s URDF link geometries (self-collision enabled) and rigid obstacle geometries in the wo… view at source ↗
Figure 14
Figure 14. Figure 14: Kuka stacking environment State/action spaces and trajectory content We model stacking as state–action trajectory diffusion, with: • State: 𝑠𝑡 ∈ R 39 (unconditional stacking state vector), which contains robot configuration/proprioception and object pose features required by the benchmark. • Action: 𝑎𝑡 ∈ R 11, a continuous control vector used by the benchmark controller. In this suite, actions command the… view at source ↗
Figure 16
Figure 16. Figure 16: RLBench Reach Target environment Observations We use the standard RLBench multi-view RGB observation setup for Reach Target with four fixed cam￾eras: front, left_shoulder, right_shoulder, and wrist. Each camera produces an RGB frame at raw resolution 256 × 256. Before entering the policy, each RGB image is resized with bilinear interpolation to 84×84 and normalized to floating-point pixels in [0, 1] (with… view at source ↗
Figure 19
Figure 19. Figure 19: Marine Navigation setting. and use only the (𝑥, 𝑦, 𝑧) coordinates to improve appearance generalization. In addition to the point cloud, DP3 conditions on low￾dimensional robot state (end-effector pose and gripper state in the benchmark’s standard representation), normalized by dataset statistics. We stack an observation history of 𝑇𝑜 = 2 timesteps for both the point cloud features and the robot state feat… view at source ↗
Figure 18
Figure 18. Figure 18: Custom DP3 pour environment Observations We were unable to replicate the results in the DP3 paper (simulator mismatch), hence, we followed the protocol, trained the model on a custom DP3 pour environment and tested on the same. DP3 conditions on a compact 3D scene representation derived from a single-view depth camera. At each control step, we convert the depth image into a point cloud using the camera in… view at source ↗
Figure 20
Figure 20. Figure 20: Crazyflie operating environment Control and planning frequencies • Planning frequency: 10 Hz. We replan in receding horizon, producing a short 3D trajectory segment each cycle. • Command rate: 100 Hz. We send position (or velocity) setpoints to the onboard controller at 100 Hz; between replans, the setpoint stream tracks the most recent planned segment. Trajectory-to-control integration The diffusion plan… view at source ↗
Figure 21
Figure 21. Figure 21: SO-100 tabletop manipulation environment. • Start/goal sampling: sample start and goal positions uniformly from a rectangular prism (workspace), rejecting samples within obstacle inflation margins. Goals are separated from starts by at least 1.0 m Euclidean distance. • Termination: success if the quadrotor enters a goal ball of radius 0.15 m and remains inside for 1.0 s; collision if it enters an inflated… view at source ↗
read the original abstract

Diffusion-based trajectory planners can synthesize rich, multimodal robot motions, but their iterative denoising makes online planning and control prohibitively slow. Existing accelerations either modify the sampler or compress the network--sacrificing plan quality or requiring retraining without accounting for downstream control risk. We address the problem of making diffusion-based trajectory planners fast enough for real-time robot use without retraining the model or sacrificing trajectory quality, and in a way that works across diverse state-space diffusion architectures. Our key insight is that diffusion trajectory planners expose two signals we can exploit: a cheap probe of how their internal trajectory representation changes across steps, and analytic coefficients that describe how denoiser errors affect the sampler's state update. By calibrating the first signal against the second on offline runs, we obtain a per-step score that upper-bounds how far the final trajectory can deviate when we reuse a cached denoiser output, and we treat this bound as an uncertainty budget that we can spend over the denoising process. Building on this insight, we present Muninn, a training-free caching wrapper that tracks this uncertainty budget during sampling and, at each diffusion step, chooses between reusing a cached denoiser output when the predicted deviation is small and recomputing the denoiser when it is not. Across standard benchmarks Muninn delivers up to 4.6x wall-clock speedups across several trajectory diffusion models by reducing denoiser evaluations, while preserving task performance and safety metrics. Muninn further certifies that cached rollouts remain within a specified distance of their full-compute counterparts, and we validate these gains in real-time closed-loop navigation and manipulation hardware deployments. Project page: https://github.com/gokulp01/Muninn.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Muninn, a training-free caching wrapper for diffusion-based trajectory planners. It exploits a cheap trajectory-change probe and analytic denoiser-error coefficients, calibrated offline, to produce a per-step uncertainty score that upper-bounds final-trajectory deviation when cached denoiser outputs are reused. The method decides at each diffusion step whether to reuse the cache or recompute, treating the bound as an expendable uncertainty budget. Across benchmarks it reports up to 4.6x wall-clock speedups while preserving task performance and safety metrics, certifies that cached rollouts stay within a user-specified distance of full-compute counterparts, and validates the approach in real-time closed-loop hardware deployments for navigation and manipulation.

Significance. If the offline-calibrated bound remains valid under closed-loop state feedback, Muninn would offer a general, retraining-free route to real-time deployment of multimodal diffusion planners without sacrificing quality or safety. The analytic grounding of the error propagation and the training-free nature are notable strengths that could apply across diverse state-space architectures. Hardware validation is a positive indicator of practical utility, though the absence of direct bound-violation tests leaves the certification claim dependent on empirical margins.

major comments (3)
  1. [Abstract and §3] Abstract and §3 (Method): the per-step uncertainty score is calibrated on offline open-loop full-compute rollouts to match analytic coefficients; the manuscript provides no direct verification (e.g., measured vs. predicted deviation histograms or worst-case violation rates) that this score continues to upper-bound final-trajectory deviation once states are produced by previously cached trajectories inside a closed loop. This is load-bearing for the certification claim.
  2. [§5] §5 (Experiments): no ablation or sensitivity analysis is reported for the uncertainty-budget threshold (a free parameter); the reported 4.6x speedups and preserved safety metrics could be sensitive to its choice, yet only aggregate results are shown without error bars or per-seed statistics.
  3. [§4.2] §4.2 (Closed-loop validation): hardware deployments preserve safety metrics, but this does not constitute a test of the mathematical bound itself; an empirical safety margin could mask cases where the offline-calibrated score becomes optimistic under distribution shift induced by the caching policy.
minor comments (2)
  1. [§3] Notation for the trajectory-change probe and analytic coefficients should be introduced with explicit equations and a small worked example to improve readability.
  2. [Table 1] Table 1 (speedup results) would benefit from per-model breakdown of cache-hit rate and average deviation observed, rather than only aggregate wall-clock numbers.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important aspects of our certification claims and experimental rigor. We address each major comment below, agreeing where revisions are needed and providing clarifications on the theoretical grounding. We will incorporate the suggested additions in the revised manuscript.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (Method): the per-step uncertainty score is calibrated on offline open-loop full-compute rollouts to match analytic coefficients; the manuscript provides no direct verification (e.g., measured vs. predicted deviation histograms or worst-case violation rates) that this score continues to upper-bound final-trajectory deviation once states are produced by previously cached trajectories inside a closed loop. This is load-bearing for the certification claim.

    Authors: We agree that direct verification of the bound under closed-loop state feedback is essential for the certification claim. The analytic error-propagation coefficients are derived from the sampler's deterministic update rule and hold independently of state origin, provided the per-step denoiser error remains within the calibrated range. However, the calibration data are open-loop. In the revision we will add closed-loop simulation experiments that (i) run Muninn with caching, (ii) compute both the predicted per-step uncertainty and the actual final-trajectory deviation from the full-compute baseline, and (iii) report histograms and worst-case violation rates across multiple seeds and environments. This will empirically confirm whether the offline-calibrated score remains a valid upper bound under the distribution shift induced by caching. revision: yes

  2. Referee: [§5] §5 (Experiments): no ablation or sensitivity analysis is reported for the uncertainty-budget threshold (a free parameter); the reported 4.6x speedups and preserved safety metrics could be sensitive to its choice, yet only aggregate results are shown without error bars or per-seed statistics.

    Authors: We acknowledge the value of characterizing sensitivity to the uncertainty-budget threshold. In the original experiments the threshold was chosen to achieve a target speedup while preserving task metrics, but no systematic ablation was presented. In the revised manuscript we will include an ablation study that varies the budget threshold over a range of values, reporting the resulting wall-clock speedup, task success rate, and safety metrics together with mean and standard deviation across at least five random seeds. Error bars will be added to all aggregate plots to quantify variability. revision: yes

  3. Referee: [§4.2] §4.2 (Closed-loop validation): hardware deployments preserve safety metrics, but this does not constitute a test of the mathematical bound itself; an empirical safety margin could mask cases where the offline-calibrated score becomes optimistic under distribution shift induced by the caching policy.

    Authors: The referee is correct that preserved safety metrics in hardware do not directly validate the mathematical bound. The hardware results demonstrate practical feasibility and absence of safety violations under real-world conditions, but they rely on an empirical margin. To address this, the revision will add the closed-loop simulation analysis described in response to the first comment, explicitly comparing predicted versus realized trajectory deviations. These simulations will be performed on the same state distributions encountered in the hardware trials, thereby testing the bound under the precise distribution shift induced by the caching policy. revision: yes

Circularity Check

0 steps flagged

No significant circularity; calibration explicitly empirical and transparently stated

full rationale

The paper describes obtaining the per-step score explicitly via calibration of a trajectory-change probe against analytic denoiser-error coefficients on offline runs, then using the resulting score as an uncertainty budget for caching decisions. This is presented as a practical, training-free wrapper rather than a first-principles derivation claimed to hold by mathematical necessity. No equations or steps in the provided text reduce a claimed prediction or bound to its inputs by construction (e.g., no fitted threshold renamed as an independently derived guarantee). The method acknowledges its dependence on offline data and reports separate empirical validation on benchmarks and hardware; the closed-loop validity concern is a question of assumption strength, not a circular reduction in the derivation itself. No self-citations, ansatzes smuggled via prior work, or renaming of known results appear as load-bearing elements.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The method rests on an offline calibration step that fits the mapping from the two signals to a usable deviation bound; this introduces a data-dependent threshold whose value is not fixed by theory.

free parameters (1)
  • uncertainty budget threshold
    Calibrated once on offline runs to decide when the predicted deviation permits safe reuse of a cached denoiser output.

pith-pipeline@v0.9.0 · 5640 in / 1172 out tokens · 52903 ms · 2026-05-12T03:44:53.140292+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean Jcost uniqueness and convexity echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    We define the per-step trajectory cost associated with a ∥e_t∥ as c_t(e_t):= Γ L_t ∥e_t∥. Then (4) simply states that the total trajectory deviation is bounded by the sum of per-step costs, d(τ_full_0, τ̃_0) ≤ ∑ c_t(e_t). Muninn uses (5) as a budget rule

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

110 extracted references · 110 canonical work pages · 10 internal anchors

  1. [1]

    Approximate caching for efficiently serving Text- to-Image diffusion models

    Shubham Agarwal, Subrata Mitra, Sarthak Chakraborty, Srikrishna Karanam, Koyel Mukherjee, and Shiv Kumar Saini. Approximate caching for efficiently serving Text- to-Image diffusion models. InUSENIX Symposium on Networked Systems Design and Implementation, pages 1173–1189, 2024

  2. [2]

    Is conditional generative model- ing all you need for decision-making?arXiv preprint arXiv:2211.15657,

    Anurag Ajay, Yilun Du, Abhi Gupta, Joshua B. Tenen- baum, Tommi Jaakkola, and Pulkit Agrawal. Is conditional generative modeling all you need for decision-making? arXiv preprint arXiv:2211.15657, 2022

  3. [3]

    Training Diffusion Models with Reinforcement Learning

    Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning.arXiv preprint arXiv:2305.13301, 2023

  4. [4]

    Token merging for fast stable diffusion

    Daniel Bolya and Judy Hoffman. Token merging for fast stable diffusion. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4599–4603, 2023

  5. [5]

    Dicache: Let diffusion model determine its own cache.arXiv preprint arXiv:2508.17356, 2025

    Jiazi Bu, Pengyang Ling, Yujie Zhou, Yibin Wang, Yuhang Zang, Tong Wu, Dahua Lin, and Jiaqi Wang. Dicache: Let diffusion model determine its own cache. arXiv preprint arXiv:2508.17356, 2025

  6. [6]

    Motion planning diffusion: Learning and planning of robot motions with diffusion models

    Joao Carvalho, An T Le, Mark Baierl, Dorothea Koert, and Jan Peters. Motion planning diffusion: Learning and planning of robot motions with diffusion models. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1916–1923, 2023

  7. [7]

    Accelerating vision diffu- sion transformers with skip branches.arXiv preprint arXiv:2411.17616, 2024

    Guanjie Chen, Xinyu Zhao, Yucheng Zhou, Tianlong Chen, and Cheng Yu. Accelerating vision diffu- sion transformers with skip branches.arXiv preprint arXiv:2411.17616, 2024

  8. [8]

    Adaptive time-stepping schedules for diffusion models

    Yuzhu Chen, Fengxiang He, Shi Fu, Xinmei Tian, and Dacheng Tao. Adaptive time-stepping schedules for diffusion models. InConference on Uncertainty in Artificial Intelligence, 2024

  9. [9]

    Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 44(10-11):1684–1704, 2025

    Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burchfiel, Russ Tedrake, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 44(10-11):1684–1704, 2025

  10. [10]

    End-to-end driving via conditional imitation learning

    Felipe Codevilla, Matthias M¨ uller, Antonio L ´opez, Vladlen Koltun, and Alexey Dosovitskiy. End-to-end driving via conditional imitation learning. InIEEE International Conference on Robotics and Automation, pages 4693–4700, 2018

  11. [11]

    D4RL: Datasets for Deep Data-Driven Reinforcement Learning

    Justin Fu, Aviral Kumar, Ofir Nachum, G. Tucker, and Sergey Levine. D4rl: Datasets for deep data-driven reinforcement learning.arXiv preprint arXiv:2004.07219, 2020

  12. [12]

    Adaptive Computation Time for Recurrent Neural Networks

    Alex Graves. Adaptive computation time for recurrent neural networks.arXiv preprint arXiv:1603.08983, 2016

  13. [13]

    Social GAN: Socially acceptable trajectories with generative adversarial networks

    Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, and Alexandre Alahi. Social GAN: Socially acceptable trajectories with generative adversarial networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2255–2264, 2018

  14. [14]

    Classifier-Free Diffusion Guidance

    Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022

  15. [15]

    Denoising diffusion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. InNeural Information Processing Systems, volume 33, pages 6840–6851, 2020

  16. [16]

    Fleet, Mohammad Norouzi, and Tim Salimans

    Jonathan Ho, Chitwan Saharia, William Chan, David J. Fleet, Mohammad Norouzi, and Tim Salimans. Cascaded diffusion models for high fidelity image generation. Journal of Machine Learning Research, 23(47):1–33, 2022

  17. [17]

    Diffusion models as optimizers for efficient planning in offline RL

    Renming Huang, Yunqiang Pei, Guoqing Wang, Yang- ming Zhang, Yang Yang, Peng Wang, and Hengtao Shen. Diffusion models as optimizers for efficient planning in offline RL. InEuropean Conference on Computer Vision, pages 1–17, 2024

  18. [18]

    Diffusion-based generation, optimization, and planning in 3D scenes

    Siyuan Huang, Zan Wang, Puhao Li, Baoxiong Jia, Tengyu Liu, Yixin Zhu, Wei Liang, and Song-Chun Zhu. Diffusion-based generation, optimization, and planning in 3D scenes. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16750–16761, 2023

  19. [19]

    The Trajectron: Probabilistic multi-agent trajectory modeling with dy- namic spatiotemporal graphs

    Boris Ivanovic and Marco Pavone. The Trajectron: Probabilistic multi-agent trajectory modeling with dy- namic spatiotemporal graphs. InIEEE/CVF International Conference on Computer Vision, pages 2375–2384, 2019

  20. [20]

    Stephen James, Zicong Ma, David Rovick Arrojo, and An- drew J. Davison. Rlbench: The robot learning benchmark & learning environment.IEEE Robotics and Automation Letters, 5:3019–3026, 2019

  21. [21]

    Planning with Diffusion for Flexible Behavior Synthesis

    Michael Janner, Yilun Du, Joshua B Tenenbaum, and Sergey Levine. Planning with diffusion for flexible behavior synthesis.arXiv preprint arXiv:2205.09991, 2022

  22. [22]

    Tree-guided diffusion planner.arXiv preprint arXiv:2508.21800, 2025

    Hyeonseong Jeon, Cheolhong Min, and Jaesik Park. Tree-guided diffusion planner.arXiv preprint arXiv:2508.21800, 2025

  23. [23]

    Stomp: Stochastic trajectory optimization for motion planning

    Mrinal Kalakrishnan, Sachin Chitta, Evangelos Theodorou, Peter Pastor, and Stefan Schaal. Stomp: Stochastic trajectory optimization for motion planning. InIEEE International Conference on Robotics and Automation, pages 4569–4574, 2011

  24. [24]

    Elucidating the design space of diffusion-based generative models

    Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. InNeural Information Processing Systems, volume 35, pages 26565–26577, 2022

  25. [25]

    Auto-Encoding Variational Bayes

    Diederik P Kingma and Max Welling. Auto-encoding variational Bayes.arXiv preprint arXiv:1312.6114, 2013

  26. [26]

    Desire: Distant future prediction in dynamic scenes with interacting agents

    Namhoon Lee, Wongun Choi, Paul Vernaza, Christo- pher B Choy, Philip HS Torr, and Manmohan Chandraker. Desire: Distant future prediction in dynamic scenes with interacting agents. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 336–345, 2017

  27. [27]

    Autodiffusion: Training-free optimization of time steps and architectures for automated diffusion model acceleration

    Lijiang Li, Huixia Li, Xiawu Zheng, Jie Wu, Xuefeng Xiao, Rui Wang, Min Zheng, Xin Pan, Fei Chao, and Rongrong Ji. Autodiffusion: Training-free optimization of time steps and architectures for automated diffusion model acceleration. InIEEE/CVF International Conference on Computer Vision, pages 7105–7114, 2023

  28. [28]

    Iterative linear quadratic regulator design for nonlinear biological movement sys- tems

    Weiwei Li and Emanuel Todorov. Iterative linear quadratic regulator design for nonlinear biological movement sys- tems. InInternational Conference on Informatics in Control, Automation and Robotics, pages 222–229, 2004

  29. [29]

    AdaptDiffuser: Diffusion models as adaptive self-evolving planners.arXiv preprint arXiv:2302.01877, 2023

    Zhixuan Liang, Yao Mu, Mingyu Ding, Fei Ni, Masayoshi Tomizuka, and Ping Luo. AdaptDiffuser: Diffusion models as adaptive self-evolving planners.arXiv preprint arXiv:2302.01877, 2023

  30. [30]

    FastBERT: A self-distilling BERT with adaptive inference time

    Weijie Liu, Peng Zhou, Zhiruo Wang, Zhe Zhao, Haotang Deng, and Qi Ju. FastBERT: A self-distilling BERT with adaptive inference time. InAnnual Meeting of the Association for Computational Linguistics, pages 6035– 6044, 2020

  31. [31]

    DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps

    Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongx- uan Li, and Jun Zhu. DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. InNeural Information Processing Systems, volume 35, pages 5775–5787, 2022

  32. [32]

    What makes a good diffusion planner for decision making? arXiv preprint arXiv:2503.00535, 2025

    Haofei Lu, Dongqi Han, Yifei Shen, and Dongsheng Li. What makes a good diffusion planner for decision making? arXiv preprint arXiv:2503.00535, 2025

  33. [33]

    Dreamfuser: Value-guided diffusion policy for offline reinforcement learning

    Kairong Luo, Caiwei Xiao, Zhiao Huang, Zhan Ling, Yunhao Fang, and Hao Su. Dreamfuser: Value-guided diffusion policy for offline reinforcement learning

  34. [34]

    Tenenbaum, and Yilun Du

    Yunhao Luo, Chen Sun, Joshua B. Tenenbaum, and Yilun Du. Potential-based diffusion motion planning.arXiv preprint arXiv:2407.06169, 2024

  35. [35]

    Generative trajectory stitching through diffusion composition

    Yunhao Luo, Utkarsh A. Mishra, Yilun Du, and Danfei Xu. Generative trajectory stitching through diffusion composition.arXiv preprint arXiv:2503.05153, 2025

  36. [36]

    Fastercache: Training-free video diffusion model acceleration with high quality.arXiv preprint arXiv:2410.19355, 2024

    Zhengyao Lv, Chenyang Si, Junhao Song, Zhenyu Yang, Yu Qiao, Ziwei Liu, and Kwan-Yee K Wong. Fastercache: Training-free video diffusion model acceleration with high quality.arXiv preprint arXiv:2410.19355, 2024

  37. [37]

    Accelerating diffusion models via early stop of the diffusion process

    Zhaoyang Lyu, Xudong Xu, Ceyuan Yang, Dahua Lin, and Bo Dai. Accelerating diffusion models via early stop of the diffusion process.arXiv preprint arXiv:2205.12524, 2022

  38. [38]

    Learning-to-cache: Accelerating diffusion transformer via layer caching

    Xinyin Ma, Gongfan Fang, Michael Bi Mi, and Xin- chao Wang. Learning-to-cache: Accelerating diffusion transformer via layer caching. InNeural Information Processing Systems, volume 37, pages 133282–133304, 2024

  39. [39]

    Deep- cache: Accelerating diffusion models for free

    Xinyin Ma, Gongfan Fang, and Xinchao Wang. Deep- cache: Accelerating diffusion models for free. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15762–15772, 2024

  40. [40]

    dkv-cache: The cache for diffusion language models.arXiv preprint arXiv:2505.15781,

    Xinyin Ma, Runpeng Yu, Gongfan Fang, and Xinchao Wang. DKV-Cache: The cache for diffusion language models.arXiv preprint arXiv:2505.15781, 2025

  41. [41]

    Constrained model predictive control: Stability and optimality.Automatica, 36(6):789– 814, 2000

    David Q Mayne, James B Rawlings, Christopher V Rao, and Pierre OM Scokaert. Constrained model predictive control: Stability and optimality.Automatica, 36(6):789– 814, 2000

  42. [42]

    On distillation of guided diffusion models

    Chenlin Meng, Robin Rombach, Ruiqi Gao, Diederik Kingma, Stefano Ermon, Jonathan Ho, and Tim Salimans. On distillation of guided diffusion models. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14297–14306, 2023

  43. [43]

    Early exiting for accelerated inference in diffusion models

    Taehong Moon, Moonseok Choi, EungGu Yun, Jongmin Yoon, Gayoung Lee, and Juho Lee. Early exiting for accelerated inference in diffusion models. InICML Work- shop on Structured Probabilistic Inference & Generative Modeling, 2023

  44. [44]

    A simple early exiting framework for accelerated sampling in diffusion models.arXiv preprint arXiv:2408.05927, 2024

    Taehong Moon, Moonseok Choi, EungGu Yun, Jongmin Yoon, Gayoung Lee, Jaewoong Cho, and Juho Lee. A simple early exiting framework for accelerated sampling in diffusion models.arXiv preprint arXiv:2408.05927, 2024

  45. [45]

    Improved denoising diffusion probabilistic models

    Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. InInternational Conference on Machine Learning, pages 8162–8171, 2021

  46. [46]

    Model-based diffusion for trajectory optimization

    Chaoyi Pan, Zeji Yi, Guanya Shi, and Guannan Qu. Model-based diffusion for trajectory optimization. In Neural Information Processing Systems, volume 37, pages 57914–57943, 2024

  47. [47]

    Consistency policy: Accelerated visuomotor policies via consistency distillation.arXiv preprint arXiv:2405.07503, 2024

    Aaditya Prasad, Kevin Lin, Jimmy Wu, Linqi Zhou, and Jeannette Bohg. Consistency policy: Accelerated visuomotor policies via consistency distillation.arXiv preprint arXiv:2405.07503, 2024

  48. [48]

    Belief-conditioned one-step diffusion: Real-time trajectory planning with just-enough sensing

    Gokul Puthumanaillam, Aditya Penumarti, Manav Vora, Paulo Padrao, Jose Fuentes, Leonardo Bobadilla, Jane Shin, and Melkior Ornik. Belief-conditioned one-step diffusion: Real-time trajectory planning with just-enough sensing. InConference on Robot Learning, pages 68–92, 2025

  49. [49]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022

  50. [50]

    A reduction of imitation learning and structured prediction to no-regret online learning

    St´ephane Ross, Geoffrey Gordon, and Drew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. InInternational Conference on Artificial Intelligence and Statistics, pages 627–635, 2011

  51. [51]

    EDMP: Ensemble-of-costs-guided diffusion for motion planning

    Kallol Saha, Vishal Mandadi, Jayaram Reddy, Ajit Srikanth, Aditya Agarwal, Bipasha Sen, Arun Singh, and Madhava Krishna. EDMP: Ensemble-of-costs-guided diffusion for motion planning. InIEEE International Conference on Robotics and Automation, pages 10351– 10358, 2024

  52. [52]

    Progressive Distillation for Fast Sampling of Diffusion Models

    Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models.arXiv preprint arXiv:2202.00512, 2022

  53. [53]

    Flomo: Tractable motion prediction with normalizing flows

    Christoph Sch ¨oller and Alois Knoll. Flomo: Tractable motion prediction with normalizing flows. InIEEE/RSJ International Conference on Intelligent Robots and Sys- tems, pages 7977–7984, 2021

  54. [54]

    Finding locally optimal, collision-free trajectories with sequential convex optimization

    John Schulman, Jonathan Ho, Alex X Lee, Ibrahim Awwal, Henry Bradlow, and Pieter Abbeel. Finding locally optimal, collision-free trajectories with sequential convex optimization. InRobotics: Science and Systems, 2013

  55. [55]

    Presto: Fast motion planning using diffusion models based on key- configuration environment representation

    Mingyo Seo, Yoonyoung Cho, Yoonchang Sung, Peter Stone, Yuke Zhu, and Beomjoon Kim. Presto: Fast motion planning using diffusion models based on key- configuration environment representation. InIEEE Inter- national Conference on Robotics and Automation, pages 10861–10867, 2025

  56. [56]

    Multi-robot motion planning with diffusion models.arXiv preprint arXiv:2410.03072, 2024

    Yorai Shaoul, Itamar Mishani, Shivam Vats, Jiaoyang Li, and Maxim Likhachev. Multi-robot motion planning with diffusion models.arXiv preprint arXiv:2410.03072, 2024

  57. [57]

    Denoising Diffusion Implicit Models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020

  58. [58]

    Score-Based Generative Modeling through Stochastic Differential Equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score- based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456, 2020

  59. [59]

    Consistency models

    Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. 2023

  60. [60]

    Branchynet: Fast inference via early exiting from deep neural networks

    Surat Teerapittayanon, Bradley McDanel, and Hsiang- Tsung Kung. Branchynet: Fast inference via early exiting from deep neural networks. InInternational Conference on Pattern Recognition, pages 2464–2469, 2016

  61. [61]

    SkipNet: Learning dynamic routing in convolutional networks

    Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, and Joseph E Gonzalez. SkipNet: Learning dynamic routing in convolutional networks. InEuropean Conference on Computer Vision, pages 409–424, 2018

  62. [62]

    Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

    Zhendong Wang, Jonathan J. Hunt, and Mingyuan Zhou. Diffusion policies as an expressive policy class for offline reinforcement learning.arXiv preprint arXiv:2208.06193, 2022

  63. [63]

    One-step diffusion policy: Fast visuomotor policies via diffusion distillation

    Zhendong Wang, Zhaoshuo Li, Ajay Mandlekar, Zhenjia Xu, Jiaojiao Fan, Yashraj S. Narang, Linxi Fan, Yuke Zhu, Yogesh Balaji, Mingyuan Zhou, Ming-Yu Liu, and Yuan Zeng. One-step diffusion policy: Fast visuomotor policies via diffusion distillation.arXiv preprint arXiv:2410.21257, 2024

  64. [64]

    Model predictive path integral control: From theory to parallel computation.Journal of Guidance, Control, and Dynamics, 40(2):344–357, 2017

    Grady Williams, Andrew Aldrich, and Evangelos Theodorou. Model predictive path integral control: From theory to parallel computation.Journal of Guidance, Control, and Dynamics, 40(2):344–357, 2017

  65. [65]

    Diffusion models for robotic manipulation: A survey

    Rosa Wolf, Yitian Shi, Sheng Liu, and Rania Rayyes. Diffusion models for robotic manipulation: A survey. arXiv preprint arXiv:2504.08438, 2025

  66. [66]

    On-device diffusion transformer pol- icy for efficient robot manipulation.arXiv preprint arXiv:2508.00697, 2025

    Yiming Wu, Huan Wang, Zhenghao Chen, Jianxin Pang, and Dong Xu. On-device diffusion transformer pol- icy for efficient robot manipulation.arXiv preprint arXiv:2508.00697, 2025

  67. [67]

    Chaineddiffuser: Unifying trajectory diffusion and keypose prediction for robotic manipulation

    Zhou Xian, Nikolaos Gkanatsios, Theophile Gervet, Tsung-Wei Ke, and Katerina Fragkiadaki. Chaineddiffuser: Unifying trajectory diffusion and keypose prediction for robotic manipulation. InConference on Robot Learning, 2023

  68. [68]

    EM distillation for one- step diffusion models

    Sirui Xie, Zhisheng Xiao, Diederik Kingma, Tingbo Hou, Ying Nian Wu, Kevin P Murphy, Tim Salimans, Ben Poole, and Ruiqi Gao. EM distillation for one- step diffusion models. InNeural Information Processing Systems, volume 37, pages 45073–45104, 2024

  69. [69]

    One-step diffusion with distribution matching distillation.arXiv preprint arXiv:2311.18828, 2023

    Tianwei Yin, Micha¨el Gharbi, Richard Zhang, Eli Shecht- man, Fredo Durand, William T Freeman, and Taesung Park. One-step diffusion with distribution matching distillation.arXiv preprint arXiv:2311.18828, 2023

  70. [70]

    Julian, Karol Hausman, Chelsea Finn, and Sergey Levine

    Tianhe Yu, Deirdre Quillen, Zhanpeng He, Ryan C. Julian, Karol Hausman, Chelsea Finn, and Sergey Levine. Meta- world: A benchmark and evaluation for multi-task and meta reinforcement learning. InConference on Robot Learning, 2019

  71. [71]

    3d diffusion policy: Generalizable visuomotor policy learning via simple 3d representations

    Yanjie Ze, Gu Zhang, Kangning Zhang, Chenyuan Hu, Muhan Wang, and Huazhe Xu. 3d diffusion policy: Generalizable visuomotor policy learning via simple 3d representations. InRobotics: Science and Systems (RSS), 2024

  72. [72]

    Fast and memory-efficient video diffusion using streamlined inference

    Zheng Zhan, Yushu Wu, Yifan Gong, Zichong Meng, Zhenglun Kong, Changdi Yang, Geng Yuan, Pu Zhao, Wei Niu, and Yanzhi Wang. Fast and memory-efficient video diffusion using streamlined inference. InNeural Information Processing Systems, volume 37, pages 13660– 13684, 2024

  73. [73]

    AdaDiff: Adaptive step selection for fast diffusion models

    Hui Zhang, Zuxuan Wu, Zhen Xing, Jie Shao, and Yu- Gang Jiang. AdaDiff: Adaptive step selection for fast diffusion models. InAAAI Conference on Artificial Intelligence, volume 39, pages 9914–9922, 2025

  74. [74]

    Diffusion models for reinforcement learning: A survey.arXiv preprint arXiv:2311.01223, 2023

    Zhengbang Zhu, Hanye Zhao, Haoran He, Yichao Zhong, Shenyu Zhang, Yong Yu, and Weinan Zhang. Diffusion models for reinforcement learning: A survey.arXiv preprint arXiv:2311.01223, 2023

  75. [75]

    MaDiff: Offline multi-agent learning with diffusion models

    Zhengbang Zhu, Minghuan Liu, Liyuan Mao, Bingyi Kang, Minkai Xu, Yong Yu, Stefano Ermon, and Weinan Zhang. MaDiff: Offline multi-agent learning with diffusion models. InNeural Information Processing Systems, volume 37, pages 4177–4206, 2024

  76. [76]

    CHOMP: Covariant Hamiltonian optimization for motion planning

    Matt Zucker, Nathan Ratliff, Anca D Dragan, Mihail Pivtoraiko, Matthew Klingensmith, Christopher M Dellin, J Andrew Bagnell, and Siddhartha S Srinivasa. CHOMP: Covariant Hamiltonian optimization for motion planning. International Journal of Robotics Research, 32(9-10): 1164–1193, 2013. Appendix Table of Contents: A Simulation: Environment, Task, and Datas...

  77. [77]

    #Evals/𝑡

    Offline RL / Trajectory Planning (D4RL):All D4RL planners in Table I operate over state–action trajectory segments. Let 𝑠𝑡 ∈R 𝑑𝑠 be the environment observation/state and 𝑎𝑡 ∈R 𝑑𝑎 the action. A trajectory segment of horizon 𝐻 is represented as 𝜏= (𝑠0, 𝑎0),(𝑠 1, 𝑎1), . . . ,(𝑠 𝐻−1 , 𝑎𝐻−1 ) ∈R 𝐻× (𝑑 𝑠+𝑑𝑎 ) . We denote 𝑑:=𝑑 𝑠 +𝑑 𝑎. In receding-horizon control...

  78. [78]

    Collision (%)

    Configuration-space Motion Planning (MPD/EDMP Pro- tocol):Table II evaluates configuration-space planning for a 7-DoF robot arm in clutter. Fig. 15:Clutter planning environment Robot model We plan for a 7-DoF Franka Emika Panda-class manipulator in joint space with configuration 𝑞∈R 7. We use the standard Panda joint limits (radians): 𝑞min =[−2.9,−1.8,−2....

  79. [79]

    Common receding-horizon execution All diffusion policies in Table III generate action chunks and execute them in a receding-horizon loop

    Visuomotor Imitation and Manipulation (Diffusion Poli- cies):Table III evaluates diffusion policies that generate short- horizon action/pose segments and execute them in a receding- horizon control loop. Common receding-horizon execution All diffusion policies in Table III generate action chunks and execute them in a receding-horizon loop. At control step...

  80. [80]

    SeaRobotics Surveyor ASV: 2D marine navigation: Platform and actuation The unmanned surface vehicle (USV) is aSeaRobotics Surveyor ASVequipped with adifferential-thrust propulsion module. The platform exposes avelocity set-point interface(commanded forward speed and heading/yaw-rate), while the low-level propulsion stack converts these setpoints to left/r...

Showing first 80 references.