Factored Diffusion Policies:Compositionally Generalized Robot Control with a Single Score Network

Abhishek Pai; Ege Yuceel; Noah Giles; Sayan Mitra

arxiv: 2605.22596 · v1 · pith:JDFKCSRMnew · submitted 2026-05-21 · 💻 cs.LG

Factored Diffusion Policies:Compositionally Generalized Robot Control with a Single Score Network

Sayan Mitra , Ege Yuceel , Noah Giles , Abhishek Pai This is my paper

Pith reviewed 2026-05-22 06:58 UTC · model grok-4.3

classification 💻 cs.LG

keywords factored diffusion policiescompositional generalizationrobot controlscore decompositiontrajectory certificatesdrone racing

0 comments

The pith

A single shared diffusion network with per-factor null-token dropout composes scores additively to generalize robot control to unseen factor combinations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that one diffusion network can be trained to handle robotic tasks specified by multiple factors such as objects, obstacles, and colors without collecting data for every possible combination. By using null-token dropout during training on individual factors, the network's score function decomposes additively across factors at inference time. Under approximate conditional independence of the factors given the action and observation, this additive composition approximates the true joint score within a uniform error bound. The bound is then propagated through the reverse-time diffusion ODE and a contracting controller to produce an explicit certificate on the radius of the closed-loop state trajectory tube. Drone racing trials confirm that the approach matches oracle performance on held-out combinations while multi-network baselines fail.

Core claim

Under approximate conditional independence between factors given the action-observation pair, the additive composition of per-factor scores from a single shared diffusion network approximates the true joint score with a bounded uniform error. This reduces the training-task requirement from the product of factor cardinalities to their sum. A trajectory-tube certificate chains the score-level bound through the reverse-time sampling ODE and a contracting tracking controller to certify a closed-loop state-trajectory tube whose radius factors into an ODE-sensitivity constant and a per-factor score-error budget.

What carries the argument

Additive score decomposition from a single network trained with per-factor null-token dropout, certified by chaining the uniform score error bound through the reverse diffusion ODE and contracting controller into a trajectory tube.

If this is right

The number of training demonstrations needed grows linearly with the number of factor values rather than combinatorially.
The policy succeeds on combinations of factors never seen together during training.
Trajectory deviation remains explicitly bounded by the per-factor score error and the contraction rate of the tracking controller.
A single network suffices instead of training and combining separate networks for each factor.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same null-token dropout and additive composition technique could be tested on other score-based or generative models for control.
Collecting data to directly measure conditional dependence among factors would provide a practical check on when the error bound holds.
The tube certificate could be adapted to different controller designs provided their contraction properties are quantified.

Load-bearing premise

The task factors are approximately conditionally independent given the current action and observation.

What would settle it

Compare the composed score against the score of a jointly trained network on held-out factor combinations and check whether the observed uniform error stays within the derived bound; also verify whether closed-loop trajectories remain inside the certified tube radius on physical drone tests.

Figures

Figures reproduced from arXiv: 2605.22596 by Abhishek Pai, Ege Yuceel, Noah Giles, Sayan Mitra.

**Figure 1.** Figure 1: Compositional generalization on held-out drone-racing tasks. Closed-loop trajectories on three (track, gate-size) pairs not seen during training; top-down (X–Y ), side view (X–Z), and speed vs. arc length. Black dashed: expert reference. Green dotted: oracle (trained on the same pair). Red: unfactored baseline. Blue: factored compositional policy scomp = s∅ + ∆1 + ∆2. The factored policy tracks expert and … view at source ↗

**Figure 2.** Figure 2: Zero-shot venue transfer. Each row shows the policy on a different (venue, gate-color) pair: Field– white, industrial–red, piazza–blue, and pool–white (zero-shot). Pool is entirely excluded from training; the policy never observes pool’s photometric distribution (water, sky reflections), yet passes its gate (filled rectangle) from a single onboard RGB camera and a noisy gyro. Left: bird’s-eye view of rollo… view at source ↗

**Figure 3.** Figure 3: Per-race closed-loop trajectories with all five methods overlaid. One row per track; columns are XY top-down (geometric route), XZ side profile (height), and speed vs. arc length. For each race we pick the held-out joint pair where one exists (red border, “held-out” badge), else standard (race2, race3 have no held-out combo). Methods: expert reference (dashed black), baseline (red), factored composed (blue… view at source ↗

**Figure 4.** Figure 4: Generalization (left) and certification (right) for the factored model at N=50. Left: gate passage on training (solid) vs. held-out (hatched) tasks (same headline numbers as [PITH_FULL_IMAGE:figures/full_fig_p022_4.png] view at source ↗

read the original abstract

Robotic tasks are typically specified by a tuple of factors, such as the object to be grasped, the obstacles to be avoided, the color of the target, and so on. Collecting expert demonstrations for every combination of factor values grows combinatorially. We present factored diffusion policies: a single shared diffusion network trained with per-factor null-token dropout, whose score decomposes additively across factors at inference. Under approximate conditional independence between factors given the action-observation pair, this composition approximates the true joint score with a bounded uniform error, reducing the training-task budget from a product of factor cardinalities to a sum. A trajectory-tube certificate chains this score-level bound through the reverse-time sampling ODE and a contracting tracking controller into a closed-loop state-trajectory tube whose radius factors into an ODE-sensitivity constant and a per-factor score-error budget. Unlike compositional-diffusion methods for control that combine separately trained networks, we use one shared network. Drone racing experiments confirm both the generalization bound and the certificate. On state-based multi-gate racing, the factored policy passes 90% of held-out gates -- matching an oracle -- while a K-network composition baseline collapses to 3%; on vision-based single-gate traversal, it transfers zero-shot to an unseen venue with +11.7pp success-rate gain and 2.4X crash-rate reduction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Single shared diffusion net with null-token dropout lets you add per-factor scores for new robot task combos, with experiments showing big gains on drone racing, though the independence assumption behind the bound isn't directly checked.

read the letter

The main thing to know is that this paper trains one diffusion network on robot control tasks by randomly dropping individual factors with null tokens during training. At inference the scores add up across factors, letting the policy handle unseen combinations without retraining on the full product of possibilities. They back this with a trajectory-tube certificate that chains a score error bound through the reverse ODE and a contracting controller. Experiments on state-based multi-gate drone racing hit 90% success on held-out gates while a multi-network baseline drops to 3%, and the vision version shows solid zero-shot transfer to a new venue.

Referee Report

1 major / 2 minor

Summary. The paper introduces factored diffusion policies for compositional robot control: a single shared score network is trained via per-factor null-token dropout so that, at inference, the joint score is approximated by an additive sum over per-factor scores. Under an approximate conditional-independence assumption between factors given the action-observation pair, the composition incurs a bounded uniform error; this reduces the required training-task budget from a product to a sum of factor cardinalities. The authors derive a trajectory-tube certificate that propagates the score-level error through the reverse-time diffusion ODE and a contracting tracking controller to obtain a closed-loop state-trajectory guarantee whose radius factors into an ODE-sensitivity constant and a per-factor error budget. Drone-racing experiments (state-based multi-gate and vision-based single-gate) report strong generalization, with the factored policy achieving 90 % success on held-out gates (matching an oracle) versus 3 % for a K-network baseline, plus zero-shot transfer gains on an unseen venue.

Significance. If the error bound and certificate are valid, the work supplies a practical, single-network route to compositional generalization in diffusion policies that materially lowers the combinatorial data-collection cost for multi-factor robotic tasks while furnishing an explicit closed-loop safety certificate. The empirical margins on drone racing are substantial and directly support the claimed product-to-sum reduction.

major comments (1)

[Error-bound derivation and experimental validation sections] The uniform error bound on additive score composition (and therefore the entire trajectory-tube certificate) rests on the unquantified approximate conditional-independence assumption between factors given the action-observation pair. The manuscript reports no measurement of residual factor dependence (conditional mutual information, correlation after conditioning, etc.) for either the multi-gate or vision-based tasks, nor does it supply an explicit functional dependence of the bound on the strength of dependence. Consequently it is unclear whether the realized score deviation remains inside the per-factor budget allocated to ODE sensitivity and the contracting controller.

minor comments (2)

[Experiments] In the experimental tables, explicitly state the exact number of training factor combinations used for the factored policy versus each baseline so that the claimed data-efficiency gain is numerically transparent.
[Notation and preliminaries] Ensure that the notation for the per-factor score functions, the joint score, and the error terms is introduced once and used consistently in both the main text and the appendix.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the error-bound derivation. We address the concern point-by-point below and will revise the manuscript to strengthen the validation of the approximate conditional-independence assumption.

read point-by-point responses

Referee: [Error-bound derivation and experimental validation sections] The uniform error bound on additive score composition (and therefore the entire trajectory-tube certificate) rests on the unquantified approximate conditional-independence assumption between factors given the action-observation pair. The manuscript reports no measurement of residual factor dependence (conditional mutual information, correlation after conditioning, etc.) for either the multi-gate or vision-based tasks, nor does it supply an explicit functional dependence of the bound on the strength of dependence. Consequently it is unclear whether the realized score deviation remains inside the per-factor budget allocated to ODE sensitivity and the contracting controller.

Authors: We agree that an explicit quantification of residual factor dependence and its effect on the bound would improve the manuscript. In the revision we will (i) compute and report conditional mutual information (and pairwise correlations after conditioning on the action-observation pair) for the factors in both the state-based multi-gate and vision-based single-gate experiments, and (ii) derive and state the explicit functional dependence of the uniform score error on the strength of the conditional dependence (i.e., how the bound scales with the deviation from exact independence). These additions will confirm that the observed score deviation lies inside the per-factor budget used for the ODE-sensitivity and controller contraction constants. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation follows from stated assumption and standard properties

full rationale

The paper states the approximate conditional independence assumption explicitly and derives the uniform error bound on additive score composition from it, then chains the bound through the reverse-time sampling ODE and contracting controller using standard sensitivity and contraction arguments. No equation reduces a claimed prediction or first-principles result to a fitted quantity or prior self-citation by construction. The training-budget reduction and trajectory-tube certificate are consequences of the given assumption plus diffusion and control theory, with no self-definitional loop or renamed fitted input visible in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests primarily on the domain assumption of approximate conditional independence between factors; no free parameters or invented entities are described in the abstract.

axioms (1)

domain assumption Approximate conditional independence between factors given the action-observation pair
Invoked to bound the uniform error when additively composing per-factor scores.

pith-pipeline@v0.9.0 · 5776 in / 1367 out tokens · 52585 ms · 2026-05-22T06:58:53.774188+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Under approximate conditional independence between factors given the action-observation pair, this composition approximates the true joint score with a bounded uniform error... Theorem 1 (Decomposition error bound) ... ∥s−scomp∥≤2√GM
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

A trajectory-tube certificate chains this score-level bound through the reverse-time sampling ODE and a contracting tracking controller

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 2 internal anchors

[1]

A L yapunov approach to incremental stability properties

David Angeli. A L yapunov approach to incremental stability properties. IEEE Transactions on Automatic Control, 47 0 (3): 0 410--421, 2002

work page 2002
[3]

Nearly d -linear convergence bounds for diffusion models via stochastic localization

Joe Benton, Valentin De Bortoli, Arnaud Doucet, and George Deligiannidis. Nearly d -linear convergence bounds for diffusion models via stochastic localization. In International Conference on Learning Representations (ICLR), 2024

work page 2024
[4]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al. _0 : A vision-language-action flow model for general robot control. arXiv preprint arXiv:2410.24164, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[6]

Compose your policies! improving diffusion-based or flow-based robot policies via test-time distribution-level composition

Yongyuan Cao et al. Compose your policies! improving diffusion-based or flow-based robot policies via test-time distribution-level composition. In International Conference on Learning Representations (ICLR), 2026. To appear

work page 2026
[7]

Sitan Chen, Sinho Chewi, Jerry Li, Yuanzhi Li, Adil Salim, and Anru R. Zhang. Sampling is as easy as learning the score: Theory for diffusion models with minimal data assumptions. In International Conference on Learning Representations (ICLR), 2023

work page 2023
[8]

Diffusion policy: Visuomotor policy learning via action diffusion

Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. In Robotics: Science and Systems (RSS), 2023

work page 2023
[9]

Tenenbaum, Sander Dieleman, Rob Fergus, Jascha Sohl-Dickstein, Arnaud Doucet, and Will Grathwohl

Yilun Du, Conor Durkan, Robin Strudel, Joshua B. Tenenbaum, Sander Dieleman, Rob Fergus, Jascha Sohl-Dickstein, Arnaud Doucet, and Will Grathwohl. Reduce, reuse, recycle: Compositional generation with energy-based diffusion models and MCMC . In International Conference on Machine Learning (ICML), 2023

work page 2023
[10]

Michael Everett, Golnaz Habibi, Chuangchuang Sun, and Jonathan P. How. Reachability analysis of neural feedback loops. IEEE Access, 9: 0 163938--163953, 2021

work page 2021
[11]

Locally optimal reach set over-approximation for nonlinear systems

Chuchu Fan, James Kapinski, Xiaoqing Jin, and Sayan Mitra. Locally optimal reach set over-approximation for nonlinear systems. In Proceedings of the 13th ACM-SIGBED International Conference on Embedded Software (EMSOFT), EMSOFT '16, pages 6:1--6:10, New York, NY, USA, 2016. ACM. URL http://doi.acm.org/10.1145/2968478.2968482. Nominated for best paper award

work page doi:10.1145/2968478.2968482 2016
[12]

Mahyar Fazlyab, Alexander Robey, Hamed Hassani, Manfred Morari, and George J. Pappas. Efficient and accurate estimation of lipschitz constants for deep neural networks. Curran Associates Inc., Red Hook, NY, USA, 2019

work page 2019
[13]

Time-optimal planning for quadrotor waypoint flight

Philipp Foehn, Angel Romero, and Davide Scaramuzza. Time-optimal planning for quadrotor waypoint flight. Science Robotics, 6 0 (56), 2021

work page 2021
[14]

Classifier-free diffusion guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. In NeurIPS Workshop on Deep Generative Models and Downstream Applications, 2022

work page 2022
[15]

Compositional perception contracts for verified autonomy

Yifei Ji et al. Compositional perception contracts for verified autonomy. arXiv preprint, 2025

work page 2025
[16]

Champion-level drone racing using deep reinforcement learning

Elia Kaufmann, Leonard Bauersfeld, Antonio Loquercio, Matthias M \"u ller, Vladlen Koltun, and Davide Scaramuzza. Champion-level drone racing using deep reinforcement learning. Nature, 620: 0 982--987, 2023

work page 2023
[17]

Harris McClamroch

Taeyoung Lee, Melvin Leok, and N. Harris McClamroch. Geometric tracking control of a quadrotor UAV on SE(3) . In IEEE Conference on Decision and Control (CDC), 2010

work page 2010
[18]

SkillDiffuser : Interpretable hierarchical planning via skill abstractions in diffusion-based task execution

Zhixuan Liang, Yao Mu, Hengbo Ma, Masayoshi Tomizuka, Mingyu Ding, and Ping Luo. SkillDiffuser : Interpretable hierarchical planning via skill abstractions in diffusion-based task execution. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

work page 2024
[20]

Tenenbaum

Nan Liu, Shuang Li, Yilun Du, Antonio Torralba, and Joshua B. Tenenbaum. Compositional visual generation with composable diffusion models. In European Conference on Computer Vision (ECCV), 2022

work page 2022
[21]

Winfried Lohmiller and Jean-Jacques E. Slotine. On contraction analysis for non-linear systems. Automatica, 34 0 (6): 0 683--696, 1998

work page 1998
[23]

Mishra, Shangjie Xue, Yongxin Chen, and Danfei Xu

Utkarsh A. Mishra, Shangjie Xue, Yongxin Chen, and Danfei Xu. Generative skill chaining: Long-horizon skill planning with diffusion models. In Conference on Robot Learning (CoRL), 2023

work page 2023
[24]

CoInD : Enabling logical compositions in diffusion models

Sachit Pal et al. CoInD : Enabling logical compositions in diffusion models. arXiv preprint, 2024. Please verify exact author list, arXiv ID, and venue

work page 2024
[25]

Multimodal diffusion transformer: Learning versatile behavior from multimodal goals

Moritz Reuss, \"O mer Erdin c Yagmurlu, Fabian Wenzel, and Rudolf Lioutikov. Multimodal diffusion transformer: Learning versatile behavior from multimodal goals. In Robotics: Science and Systems (RSS), 2024

work page 2024
[26]

CAD ^2 RL : Real single-image flight without a single real image

Fereshteh Sadeghi and Sergey Levine. CAD ^2 RL : Real single-image flight without a single real image. In Robotics: Science and Systems (RSS), 2017

work page 2017
[27]

Denoising diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In International Conference on Learning Representations (ICLR), 2021 a

work page 2021
[28]

Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations (ICLR), 2021 b

work page 2021
[29]

Reaching the limit in autonomous racing: Optimal control versus reinforcement learning

Yunlong Song, Angel Romero, Matthias M \"u ller, Vladlen Koltun, and Davide Scaramuzza. Reaching the limit in autonomous racing: Optimal control versus reinforcement learning. Science Robotics, 8 0 (82), 2023

work page 2023
[30]

Domain randomization for transferring deep neural networks from simulation to the real world

Joshua Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. Domain randomization for transferring deep neural networks from simulation to the real world. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017

work page 2017
[31]

Adelson, and Russ Tedrake

Lirui Wang, Jialiang Zhao, Yilun Du, Edward H. Adelson, and Russ Tedrake. PoCo : Policy composition from and for heterogeneous robot learning. In Robotics: Science and Systems (RSS), 2024

work page 2024
[32]

Concept algebra for (score-based) text-controlled generative models

Zihao Wang, Lin Gui, Jeffrey Negrea, and Victor Veitch. Concept algebra for (score-based) text-controlled generative models. In Advances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023
[33]

3D diffusion policy: Generalizable visuomotor policy learning via simple 3D representations

Yanjie Ze, Gu Zhang, Kangning Zhang, Chenyuan Hu, Muhan Wang, and Huazhe Xu. 3D diffusion policy: Generalizable visuomotor policy learning via simple 3D representations. In Robotics: Science and Systems (RSS), 2024

work page 2024
[34]

Robotics: Science and Systems (RSS) , year =

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion , author =. Robotics: Science and Systems (RSS) , year =

work page
[35]

Robotics: Science and Systems (RSS) , year =

Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals , author =. Robotics: Science and Systems (RSS) , year =

work page
[36]

Ze, Yanjie and Zhang, Gu and Zhang, Kangning and Hu, Chenyuan and Wang, Muhan and Xu, Huazhe , booktitle =

work page
[37]

Black, Kevin and Brown, Noah and Driess, Danny and Esmail, Adnan and Equi, Michael and Finn, Chelsea and Fusai, Niccolo and Groom, Lachy and Hausman, Karol and Ichter, Brian and others , journal =

work page
[38]

European Conference on Computer Vision (ECCV) , year =

Compositional Visual Generation with Composable Diffusion Models , author =. European Conference on Computer Vision (ECCV) , year =

work page
[39]

and Dieleman, Sander and Fergus, Rob and Sohl-Dickstein, Jascha and Doucet, Arnaud and Grathwohl, Will , booktitle =

Du, Yilun and Durkan, Conor and Strudel, Robin and Tenenbaum, Joshua B. and Dieleman, Sander and Fergus, Rob and Sohl-Dickstein, Jascha and Doucet, Arnaud and Grathwohl, Will , booktitle =. Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and

work page
[40]

NeurIPS Workshop on Deep Generative Models and Downstream Applications , year =

Classifier-Free Diffusion Guidance , author =. NeurIPS Workshop on Deep Generative Models and Downstream Applications , year =

work page
[41]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Concept Algebra for (Score-Based) Text-Controlled Generative Models , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page
[42]

and Tedrake, Russ , booktitle =

Wang, Lirui and Zhao, Jialiang and Du, Yilun and Adelson, Edward H. and Tedrake, Russ , booktitle =

work page
[43]

International Conference on Learning Representations (ICLR) , year =

Compose Your Policies! Improving Diffusion-based or Flow-based Robot Policies via Test-time Distribution-level Composition , author =. International Conference on Learning Representations (ICLR) , year =

work page
[44]

Flexible Multitask Learning with Factorized Diffusion Policy

Flexible Multitask Learning with Factorized Diffusion Policy , author =. arXiv preprint arXiv:2512.21898 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[45]

arXiv preprint arXiv:2503.12466 , year =

Modality-Composable Diffusion Policy via Inference-Time Distribution-level Composition , author =. arXiv preprint arXiv:2503.12466 , year =

work page arXiv
[46]

arXiv preprint 2410.17479v1 , year =

Composing Diffusion Policies for Few-Shot Learning of Trajectory Generation in Autonomous Racing , author =. arXiv preprint 2410.17479v1 , year =

work page arXiv
[47]

International Conference on Learning Representations (ICLR) , year =

Denoising Diffusion Implicit Models , author =. International Conference on Learning Representations (ICLR) , year =

work page
[48]

International Conference on Learning Representations (ICLR) , year =

Score-Based Generative Modeling through Stochastic Differential Equations , author =. International Conference on Learning Representations (ICLR) , year =

work page
[49]

International Conference on Learning Representations (ICLR) , year =

Sampling is as Easy as Learning the Score: Theory for Diffusion Models with Minimal Data Assumptions , author =. International Conference on Learning Representations (ICLR) , year =

work page
[50]

International Conference on Learning Representations (ICLR) , year =

Nearly d -Linear Convergence Bounds for Diffusion Models via Stochastic Localization , author =. International Conference on Learning Representations (ICLR) , year =

work page
[51]

, title =

Fazlyab, Mahyar and Robey, Alexander and Hassani, Hamed and Morari, Manfred and Pappas, George J. , title =. Proceedings of the 33rd International Conference on Neural Information Processing Systems , articleno =. 2019 , publisher =

work page 2019
[52]

IEEE Access , volume =

Reachability Analysis of Neural Feedback Loops , author =. IEEE Access , volume =

work page
[53]

and Mitra, Sayan , title =

Astorga, Angello and Hsieh, Chiao and Madhusudan, P. and Mitra, Sayan , title =. Proc. ACM Program. Lang. , month = oct, articleno =. 2023 , issue_date =. doi:10.1145/3622875 , abstract =

work page doi:10.1145/3622875 2023
[54]

arXiv preprint , year =

Compositional Perception Contracts for Verified Autonomy , author =. arXiv preprint , year =

work page
[55]

Liang, Zhixuan and Mu, Yao and Ma, Hengbo and Tomizuka, Masayoshi and Ding, Mingyu and Luo, Ping , booktitle =

work page
[56]

Conference on Robot Learning (CoRL) , year =

Generative Skill Chaining: Long-Horizon Skill Planning with Diffusion Models , author =. Conference on Robot Learning (CoRL) , year =

work page
[57]

Nature , volume =

Champion-Level Drone Racing Using Deep Reinforcement Learning , author =. Nature , volume =

work page
[58]

Science Robotics , volume =

Reaching the Limit in Autonomous Racing: Optimal Control versus Reinforcement Learning , author =. Science Robotics , volume =

work page
[59]

Science Robotics , volume =

Time-Optimal Planning for Quadrotor Waypoint Flight , author =. Science Robotics , volume =

work page
[60]

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , year =

Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World , author =. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , year =

work page
[61]

Sadeghi, Fereshteh and Levine, Sergey , booktitle =

work page
[62]

Harris , booktitle =

Lee, Taeyoung and Leok, Melvin and McClamroch, N. Harris , booktitle =. Geometric Tracking Control of a Quadrotor

work page
[63]

Automatica , volume =

On Contraction Analysis for Non-linear Systems , author =. Automatica , volume =

work page
[64]

Angeli, David , journal =. A

work page
[65]

IEEE Transactions on Automatic Control , volume =

Smooth Stabilization Implies Coprime Factorization , author =. IEEE Transactions on Automatic Control , volume =

work page
[66]

Proceedings of the 13th ACM-SIGBED International Conference on Embedded Software (EMSOFT) , year =

Fan, Chuchu and Kapinski, James and Jin, Xiaoqing and Mitra, Sayan , title =. Proceedings of the 13th ACM-SIGBED International Conference on Embedded Software (EMSOFT) , year =

work page
[67]

Transactions on Machine Learning Research (TMLR) , year =

Oquab, Maxime and Darcet, Timoth. Transactions on Machine Learning Research (TMLR) , year =

work page
[68]

2024 , note =

Pal, Sachit and others , journal =. 2024 , note =

work page 2024

[1] [1]

A L yapunov approach to incremental stability properties

David Angeli. A L yapunov approach to incremental stability properties. IEEE Transactions on Automatic Control, 47 0 (3): 0 410--421, 2002

work page 2002

[2] [3]

Nearly d -linear convergence bounds for diffusion models via stochastic localization

Joe Benton, Valentin De Bortoli, Arnaud Doucet, and George Deligiannidis. Nearly d -linear convergence bounds for diffusion models via stochastic localization. In International Conference on Learning Representations (ICLR), 2024

work page 2024

[3] [4]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al. _0 : A vision-language-action flow model for general robot control. arXiv preprint arXiv:2410.24164, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[4] [6]

Compose your policies! improving diffusion-based or flow-based robot policies via test-time distribution-level composition

Yongyuan Cao et al. Compose your policies! improving diffusion-based or flow-based robot policies via test-time distribution-level composition. In International Conference on Learning Representations (ICLR), 2026. To appear

work page 2026

[5] [7]

Sitan Chen, Sinho Chewi, Jerry Li, Yuanzhi Li, Adil Salim, and Anru R. Zhang. Sampling is as easy as learning the score: Theory for diffusion models with minimal data assumptions. In International Conference on Learning Representations (ICLR), 2023

work page 2023

[6] [8]

Diffusion policy: Visuomotor policy learning via action diffusion

Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. In Robotics: Science and Systems (RSS), 2023

work page 2023

[7] [9]

Tenenbaum, Sander Dieleman, Rob Fergus, Jascha Sohl-Dickstein, Arnaud Doucet, and Will Grathwohl

Yilun Du, Conor Durkan, Robin Strudel, Joshua B. Tenenbaum, Sander Dieleman, Rob Fergus, Jascha Sohl-Dickstein, Arnaud Doucet, and Will Grathwohl. Reduce, reuse, recycle: Compositional generation with energy-based diffusion models and MCMC . In International Conference on Machine Learning (ICML), 2023

work page 2023

[8] [10]

Michael Everett, Golnaz Habibi, Chuangchuang Sun, and Jonathan P. How. Reachability analysis of neural feedback loops. IEEE Access, 9: 0 163938--163953, 2021

work page 2021

[9] [11]

Locally optimal reach set over-approximation for nonlinear systems

Chuchu Fan, James Kapinski, Xiaoqing Jin, and Sayan Mitra. Locally optimal reach set over-approximation for nonlinear systems. In Proceedings of the 13th ACM-SIGBED International Conference on Embedded Software (EMSOFT), EMSOFT '16, pages 6:1--6:10, New York, NY, USA, 2016. ACM. URL http://doi.acm.org/10.1145/2968478.2968482. Nominated for best paper award

work page doi:10.1145/2968478.2968482 2016

[10] [12]

Mahyar Fazlyab, Alexander Robey, Hamed Hassani, Manfred Morari, and George J. Pappas. Efficient and accurate estimation of lipschitz constants for deep neural networks. Curran Associates Inc., Red Hook, NY, USA, 2019

work page 2019

[11] [13]

Time-optimal planning for quadrotor waypoint flight

Philipp Foehn, Angel Romero, and Davide Scaramuzza. Time-optimal planning for quadrotor waypoint flight. Science Robotics, 6 0 (56), 2021

work page 2021

[12] [14]

Classifier-free diffusion guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. In NeurIPS Workshop on Deep Generative Models and Downstream Applications, 2022

work page 2022

[13] [15]

Compositional perception contracts for verified autonomy

Yifei Ji et al. Compositional perception contracts for verified autonomy. arXiv preprint, 2025

work page 2025

[14] [16]

Champion-level drone racing using deep reinforcement learning

Elia Kaufmann, Leonard Bauersfeld, Antonio Loquercio, Matthias M \"u ller, Vladlen Koltun, and Davide Scaramuzza. Champion-level drone racing using deep reinforcement learning. Nature, 620: 0 982--987, 2023

work page 2023

[15] [17]

Harris McClamroch

Taeyoung Lee, Melvin Leok, and N. Harris McClamroch. Geometric tracking control of a quadrotor UAV on SE(3) . In IEEE Conference on Decision and Control (CDC), 2010

work page 2010

[16] [18]

SkillDiffuser : Interpretable hierarchical planning via skill abstractions in diffusion-based task execution

Zhixuan Liang, Yao Mu, Hengbo Ma, Masayoshi Tomizuka, Mingyu Ding, and Ping Luo. SkillDiffuser : Interpretable hierarchical planning via skill abstractions in diffusion-based task execution. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

work page 2024

[17] [20]

Tenenbaum

Nan Liu, Shuang Li, Yilun Du, Antonio Torralba, and Joshua B. Tenenbaum. Compositional visual generation with composable diffusion models. In European Conference on Computer Vision (ECCV), 2022

work page 2022

[18] [21]

Winfried Lohmiller and Jean-Jacques E. Slotine. On contraction analysis for non-linear systems. Automatica, 34 0 (6): 0 683--696, 1998

work page 1998

[19] [23]

Mishra, Shangjie Xue, Yongxin Chen, and Danfei Xu

Utkarsh A. Mishra, Shangjie Xue, Yongxin Chen, and Danfei Xu. Generative skill chaining: Long-horizon skill planning with diffusion models. In Conference on Robot Learning (CoRL), 2023

work page 2023

[20] [24]

CoInD : Enabling logical compositions in diffusion models

Sachit Pal et al. CoInD : Enabling logical compositions in diffusion models. arXiv preprint, 2024. Please verify exact author list, arXiv ID, and venue

work page 2024

[21] [25]

Multimodal diffusion transformer: Learning versatile behavior from multimodal goals

Moritz Reuss, \"O mer Erdin c Yagmurlu, Fabian Wenzel, and Rudolf Lioutikov. Multimodal diffusion transformer: Learning versatile behavior from multimodal goals. In Robotics: Science and Systems (RSS), 2024

work page 2024

[22] [26]

CAD ^2 RL : Real single-image flight without a single real image

Fereshteh Sadeghi and Sergey Levine. CAD ^2 RL : Real single-image flight without a single real image. In Robotics: Science and Systems (RSS), 2017

work page 2017

[23] [27]

Denoising diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In International Conference on Learning Representations (ICLR), 2021 a

work page 2021

[24] [28]

Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations (ICLR), 2021 b

work page 2021

[25] [29]

Reaching the limit in autonomous racing: Optimal control versus reinforcement learning

Yunlong Song, Angel Romero, Matthias M \"u ller, Vladlen Koltun, and Davide Scaramuzza. Reaching the limit in autonomous racing: Optimal control versus reinforcement learning. Science Robotics, 8 0 (82), 2023

work page 2023

[26] [30]

Domain randomization for transferring deep neural networks from simulation to the real world

Joshua Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. Domain randomization for transferring deep neural networks from simulation to the real world. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017

work page 2017

[27] [31]

Adelson, and Russ Tedrake

Lirui Wang, Jialiang Zhao, Yilun Du, Edward H. Adelson, and Russ Tedrake. PoCo : Policy composition from and for heterogeneous robot learning. In Robotics: Science and Systems (RSS), 2024

work page 2024

[28] [32]

Concept algebra for (score-based) text-controlled generative models

Zihao Wang, Lin Gui, Jeffrey Negrea, and Victor Veitch. Concept algebra for (score-based) text-controlled generative models. In Advances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023

[29] [33]

3D diffusion policy: Generalizable visuomotor policy learning via simple 3D representations

Yanjie Ze, Gu Zhang, Kangning Zhang, Chenyuan Hu, Muhan Wang, and Huazhe Xu. 3D diffusion policy: Generalizable visuomotor policy learning via simple 3D representations. In Robotics: Science and Systems (RSS), 2024

work page 2024

[30] [34]

Robotics: Science and Systems (RSS) , year =

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion , author =. Robotics: Science and Systems (RSS) , year =

work page

[31] [35]

Robotics: Science and Systems (RSS) , year =

Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals , author =. Robotics: Science and Systems (RSS) , year =

work page

[32] [36]

Ze, Yanjie and Zhang, Gu and Zhang, Kangning and Hu, Chenyuan and Wang, Muhan and Xu, Huazhe , booktitle =

work page

[33] [37]

Black, Kevin and Brown, Noah and Driess, Danny and Esmail, Adnan and Equi, Michael and Finn, Chelsea and Fusai, Niccolo and Groom, Lachy and Hausman, Karol and Ichter, Brian and others , journal =

work page

[34] [38]

European Conference on Computer Vision (ECCV) , year =

Compositional Visual Generation with Composable Diffusion Models , author =. European Conference on Computer Vision (ECCV) , year =

work page

[35] [39]

and Dieleman, Sander and Fergus, Rob and Sohl-Dickstein, Jascha and Doucet, Arnaud and Grathwohl, Will , booktitle =

Du, Yilun and Durkan, Conor and Strudel, Robin and Tenenbaum, Joshua B. and Dieleman, Sander and Fergus, Rob and Sohl-Dickstein, Jascha and Doucet, Arnaud and Grathwohl, Will , booktitle =. Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and

work page

[36] [40]

NeurIPS Workshop on Deep Generative Models and Downstream Applications , year =

Classifier-Free Diffusion Guidance , author =. NeurIPS Workshop on Deep Generative Models and Downstream Applications , year =

work page

[37] [41]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Concept Algebra for (Score-Based) Text-Controlled Generative Models , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page

[38] [42]

and Tedrake, Russ , booktitle =

Wang, Lirui and Zhao, Jialiang and Du, Yilun and Adelson, Edward H. and Tedrake, Russ , booktitle =

work page

[39] [43]

International Conference on Learning Representations (ICLR) , year =

Compose Your Policies! Improving Diffusion-based or Flow-based Robot Policies via Test-time Distribution-level Composition , author =. International Conference on Learning Representations (ICLR) , year =

work page

[40] [44]

Flexible Multitask Learning with Factorized Diffusion Policy

Flexible Multitask Learning with Factorized Diffusion Policy , author =. arXiv preprint arXiv:2512.21898 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[41] [45]

arXiv preprint arXiv:2503.12466 , year =

Modality-Composable Diffusion Policy via Inference-Time Distribution-level Composition , author =. arXiv preprint arXiv:2503.12466 , year =

work page arXiv

[42] [46]

arXiv preprint 2410.17479v1 , year =

Composing Diffusion Policies for Few-Shot Learning of Trajectory Generation in Autonomous Racing , author =. arXiv preprint 2410.17479v1 , year =

work page arXiv

[43] [47]

International Conference on Learning Representations (ICLR) , year =

Denoising Diffusion Implicit Models , author =. International Conference on Learning Representations (ICLR) , year =

work page

[44] [48]

International Conference on Learning Representations (ICLR) , year =

Score-Based Generative Modeling through Stochastic Differential Equations , author =. International Conference on Learning Representations (ICLR) , year =

work page

[45] [49]

International Conference on Learning Representations (ICLR) , year =

Sampling is as Easy as Learning the Score: Theory for Diffusion Models with Minimal Data Assumptions , author =. International Conference on Learning Representations (ICLR) , year =

work page

[46] [50]

International Conference on Learning Representations (ICLR) , year =

Nearly d -Linear Convergence Bounds for Diffusion Models via Stochastic Localization , author =. International Conference on Learning Representations (ICLR) , year =

work page

[47] [51]

, title =

Fazlyab, Mahyar and Robey, Alexander and Hassani, Hamed and Morari, Manfred and Pappas, George J. , title =. Proceedings of the 33rd International Conference on Neural Information Processing Systems , articleno =. 2019 , publisher =

work page 2019

[48] [52]

IEEE Access , volume =

Reachability Analysis of Neural Feedback Loops , author =. IEEE Access , volume =

work page

[49] [53]

and Mitra, Sayan , title =

Astorga, Angello and Hsieh, Chiao and Madhusudan, P. and Mitra, Sayan , title =. Proc. ACM Program. Lang. , month = oct, articleno =. 2023 , issue_date =. doi:10.1145/3622875 , abstract =

work page doi:10.1145/3622875 2023

[50] [54]

arXiv preprint , year =

Compositional Perception Contracts for Verified Autonomy , author =. arXiv preprint , year =

work page

[51] [55]

Liang, Zhixuan and Mu, Yao and Ma, Hengbo and Tomizuka, Masayoshi and Ding, Mingyu and Luo, Ping , booktitle =

work page

[52] [56]

Conference on Robot Learning (CoRL) , year =

Generative Skill Chaining: Long-Horizon Skill Planning with Diffusion Models , author =. Conference on Robot Learning (CoRL) , year =

work page

[53] [57]

Nature , volume =

Champion-Level Drone Racing Using Deep Reinforcement Learning , author =. Nature , volume =

work page

[54] [58]

Science Robotics , volume =

Reaching the Limit in Autonomous Racing: Optimal Control versus Reinforcement Learning , author =. Science Robotics , volume =

work page

[55] [59]

Science Robotics , volume =

Time-Optimal Planning for Quadrotor Waypoint Flight , author =. Science Robotics , volume =

work page

[56] [60]

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , year =

Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World , author =. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , year =

work page

[57] [61]

Sadeghi, Fereshteh and Levine, Sergey , booktitle =

work page

[58] [62]

Harris , booktitle =

Lee, Taeyoung and Leok, Melvin and McClamroch, N. Harris , booktitle =. Geometric Tracking Control of a Quadrotor

work page

[59] [63]

Automatica , volume =

On Contraction Analysis for Non-linear Systems , author =. Automatica , volume =

work page

[60] [64]

Angeli, David , journal =. A

work page

[61] [65]

IEEE Transactions on Automatic Control , volume =

Smooth Stabilization Implies Coprime Factorization , author =. IEEE Transactions on Automatic Control , volume =

work page

[62] [66]

Proceedings of the 13th ACM-SIGBED International Conference on Embedded Software (EMSOFT) , year =

Fan, Chuchu and Kapinski, James and Jin, Xiaoqing and Mitra, Sayan , title =. Proceedings of the 13th ACM-SIGBED International Conference on Embedded Software (EMSOFT) , year =

work page

[63] [67]

Transactions on Machine Learning Research (TMLR) , year =

Oquab, Maxime and Darcet, Timoth. Transactions on Machine Learning Research (TMLR) , year =

work page

[64] [68]

2024 , note =

Pal, Sachit and others , journal =. 2024 , note =

work page 2024