arxiv: 2512.21898 · v2 · submitted 2025-12-26 · 💻 cs.RO · cs.AI

Recognition: no theorem link

Flexible Multitask Learning with Factorized Diffusion Policy

Chaoqi Liu , Haonan Chen , Sigmund H. H{\o}eg , Shaoxiong Yao , Yunzhu Li , Kris Hauser , Yilun Du

Authors on Pith no claims yet

Pith reviewed 2026-05-16 19:19 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords multitask learningdiffusion policiesrobot manipulationmodular policiesaction factorizationcatastrophic forgetting

0 comments

The pith

A factorized diffusion policy decomposes complex robot actions into specialized sub-models for better multitask performance and flexible adaptation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a modular diffusion policy that breaks complex, multimodal robot action distributions into a composition of simpler specialized diffusion models. Each model focuses on one distinct sub-mode of behavior, which the authors argue produces a tighter overall fit than a single monolithic model. The same modular design supports adding or fine-tuning individual components when new tasks arrive, which the authors claim prevents the policy from forgetting earlier skills. Experiments in simulation and real-world manipulation show consistent gains over both monolithic diffusion policies and other modular baselines.

Core claim

The central claim is that factorizing a diffusion policy into specialized sub-models, each capturing a distinct sub-mode of the action distribution, yields policies that fit multimodal robot behavior more effectively and can be extended to new tasks by modular addition or fine-tuning without catastrophic forgetting.

What carries the argument

The factorized diffusion policy: a modular composition of specialized diffusion models, each trained to capture one sub-mode of the robot's multimodal action distribution.

If this is right

Policies fit multimodal action distributions more accurately than monolithic diffusion models.
New tasks can be incorporated by adding or fine-tuning only the relevant sub-model rather than retraining the full policy.
Catastrophic forgetting is inherently reduced because earlier sub-models remain untouched during adaptation.
The approach outperforms both monolithic diffusion baselines and other modular methods in robotic manipulation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same factorization idea could be applied to other generative policy architectures beyond diffusion models.
If sub-modes overlap heavily in real data, the modularity gain may shrink and require an automatic mode-discovery step.
Long-horizon tasks with many sequential modes would test whether the current decomposition remains stable over extended rollouts.

Load-bearing premise

That highly multimodal robot action distributions can be decomposed into distinct sub-modes that separate diffusion models can capture effectively.

What would settle it

A dataset or task where action distributions show no clear separable sub-modes, such that adding new modules produces no gain in fit or still causes forgetting on prior tasks.

Figures

Figures reproduced from arXiv: 2512.21898 by Chaoqi Liu, Haonan Chen, Kris Hauser, Shaoxiong Yao, Sigmund H. H{\o}eg, Yilun Du, Yunzhu Li.

**Figure 1.** Figure 1: Overview of FDP. (a) Given an observation ot, multiple diffusion experts predict score estimates εi(a K t , ot) at each denoising step. A lightweight router network computes observationdependent weights {wi}, which are used to compose the final score as a weighted sum (see (c)). The composed score guides the iterative denoising process over K steps to generate an action at. (b) This compositional structu… view at source ↗

**Figure 2.** Figure 2: Real-world setup and task illustrations. (a) Workspace setup with a UR5e arm, Robotiq gripper, and RealSense D415 camera. (b) High-level task illustrations. ously trained components {εθi } are frozen. Freezing existing components ensures that the optimization focuses solely on capturing novel task dynamics without disrupting existing capabilities, thereby mitigating catastrophic forgetting. Such selective … view at source ↗

**Figure 3.** Figure 3: Real-world rollouts. Top: cubeX. Bottom: hang-X. Top and bottom rows show success cases and baseline failure modes. MetaWorld Door Drawer Window Peg Policy Open Open Assembly Close Insert Hammer Avg. DP 87.0 ± 1.92 100.0 ± 0.00 100.0 ± 0.00 94.0 ± 0.89 20.0 ± 1.22 24.0 ± 0.89 70.8 ± 0.24 SDP 80.0 ± 1.87 100.0 ± 0.00 100.0 ± 0.00 100.0 ± 0.00 20.5 ± 0.84 18.0 ± 1.64 69.8 ± 0.51 MoDE 100.0 ± 0.00 100.0 ± 0.… view at source ↗

**Figure 5.** Figure 5: Performance scaling with number of demonstrations. (a) Metaworld tasks door open, drawer open, assembly, window close, peg insert, hammer; RLBench tasks door open, drawer open, assembly, window close, peg insert, hammer. (b) Metaworld tasks door close, drawer close, disassemble, window open; RLBench tasks toilet seat down, close box 2) Task Adaptation: We analyze how adaptation performance scales with the… view at source ↗

**Figure 7.** Figure 7: Cosine similarity between diffusion component scores. Each heatmap visualizes average pairwise similarity for independent FDP instances configured with 2-5 components, computed over four RLBench tasks. Lower similarity indicates more distinct behavioral specialization. Note that subplots represent separate training runs rather than a single evolving model. 0 250 500 750 1000 1250 1500 1750 2000 Epoch 0.000… view at source ↗

**Figure 8.** Figure 8: Training convergence curves. Mean squared error (MSE) loss over training epochs for RLBench and MetaWorld tasks. FDP consistently converges faster and more stably than MoDE and SDP, indicating improved training efficiency and optimization stability. to each other during inference. While the components are not completely orthogonal, we observe noticeable variation between different pairs, indicating that di… view at source ↗

read the original abstract

Multitask learning poses significant challenges due to the highly multimodal and diverse nature of robot action distributions. However, effectively fitting policies to these complex task distributions is often difficult, and existing monolithic models often underfit the action distribution and lack the flexibility required for efficient adaptation. We introduce a novel modular diffusion policy framework that factorizes complex action distributions into a composition of specialized diffusion models, each capturing a distinct sub-mode of the behavior space for a more effective overall policy. In addition, this modular structure enables flexible policy adaptation to new tasks by adding or fine-tuning components, which inherently mitigates catastrophic forgetting. Empirically, across both simulation and real-world robotic manipulation settings, we illustrate how our method consistently outperforms strong modular and monolithic baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The factorized diffusion policy gives a workable modular route for multimodal robot actions and task adaptation, but the decomposition step is the part that needs the most scrutiny.

read the letter

The paper's main contribution is a modular diffusion policy that splits complex robot action distributions into a set of specialized diffusion models whose outputs are composed at inference time. This is positioned as better than monolithic diffusion policies at capturing multimodal behavior and as naturally supporting adaptation by adding or fine-tuning individual components without overwriting earlier ones. The empirical section shows consistent gains over both monolithic and modular baselines in simulation and on real manipulation hardware, which is the strongest part of the work so far. That gives the claim some grounding in practice rather than just architecture diagrams. The factorization itself is the element that feels least pinned down. The abstract and stress-test note both leave open exactly how the sub-modes are identified and kept separate—whether through explicit clustering, a learned router, per-component losses, or something else. Without a clear, reproducible procedure that enforces specialization, it is possible the gains come mainly from increased capacity rather than from the modular structure per se. The forgetting-mitigation claim rests on the same point: if components can still interfere during fine-tuning, the benefit is not automatic. The math and experiments look standard for the area, with no obvious circularity or invented metrics. This is the kind of paper that belongs in a robot learning reading group for the concrete comparison to diffusion baselines. It is worth sending to peer review because the problem is real and the results are reported across sim and hardware, but the referee should be asked to check whether the factorization mechanism is described tightly enough to be replicated and whether the adaptation experiments isolate the modular benefit from simple capacity increases.

Referee Report

2 major / 1 minor

Summary. The paper introduces a modular diffusion policy framework for multitask robot learning that factorizes complex, multimodal action distributions into a composition of specialized diffusion models, each capturing a distinct sub-mode of the behavior space. This structure is claimed to yield more effective policies than monolithic baselines and to enable flexible adaptation to new tasks via addition or fine-tuning of components, inherently mitigating catastrophic forgetting. Empirical results are reported to show consistent outperformance over strong modular and monolithic baselines in both simulation and real-world robotic manipulation settings.

Significance. If the factorization mechanism and its claimed benefits are rigorously demonstrated, the work could meaningfully advance scalable multitask and continual learning for diffusion-based robot policies by addressing underfitting of multimodal actions and forgetting during adaptation. The modular design offers a potentially practical route to lifelong policy extension without retraining from scratch.

major comments (2)

[Abstract] Abstract: The central claim that the framework 'factorizes complex action distributions into a composition of specialized diffusion models' is load-bearing for both the performance and adaptation arguments, yet no decomposition procedure, gating/routing mechanism, per-component loss, or inference-time composition rule is described. Without this, it remains unclear whether the modular structure enforces specialization or simply yields a mixture whose benefits could be replicated by a single larger diffusion model.
[Abstract] The assertion that the modular structure 'inherently mitigates catastrophic forgetting' is presented as a direct consequence of add/fine-tune adaptation, but no supporting analysis (e.g., interference metrics, retention experiments, or comparison to monolithic fine-tuning) is supplied in the provided text. This assumption is critical to the multitask-learning contribution and requires explicit verification.

minor comments (1)

[Abstract] The abstract refers to 'strong modular and monolithic baselines' without naming them or citing their sources; adding these references would improve reproducibility and context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to improve clarity and provide the requested details and analyses.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the framework 'factorizes complex action distributions into a composition of specialized diffusion models' is load-bearing for both the performance and adaptation arguments, yet no decomposition procedure, gating/routing mechanism, per-component loss, or inference-time composition rule is described. Without this, it remains unclear whether the modular structure enforces specialization or simply yields a mixture whose benefits could be replicated by a single larger diffusion model.

Authors: We agree that the abstract is too high-level and will revise it to briefly describe the factorization. The full technical details are provided in Section 3: the decomposition uses a learned router that assigns action modes to specialized diffusion experts; each expert is trained with its own denoising loss on mode-specific data subsets; and inference composes outputs via weighted averaging of the experts' predicted noise at each diffusion step. We will add a short paragraph to the abstract summarizing these elements and include a clarifying figure in the main text. revision: yes
Referee: [Abstract] The assertion that the modular structure 'inherently mitigates catastrophic forgetting' is presented as a direct consequence of add/fine-tune adaptation, but no supporting analysis (e.g., interference metrics, retention experiments, or comparison to monolithic fine-tuning) is supplied in the provided text. This assumption is critical to the multitask-learning contribution and requires explicit verification.

Authors: We acknowledge that the abstract presents this as inherent without sufficient evidence. Section 5.3 already contains retention experiments on sequential task addition, showing near-zero performance drop on prior tasks when only new components are added or fine-tuned (versus clear degradation in monolithic fine-tuning baselines). We will expand this into a dedicated subsection with explicit interference metrics (e.g., average policy divergence before/after adaptation) and direct comparisons to monolithic baselines, and we will update the abstract to reference these results. revision: yes

Circularity Check

0 steps flagged

No circularity: framework claims rest on empirical validation, not self-referential reduction

full rationale

The paper introduces a modular diffusion policy that factorizes action distributions into specialized components and claims this structure inherently supports adaptation without catastrophic forgetting. These properties are asserted as consequences of the proposed architecture and are supported by direct empirical comparisons against baselines in simulation and real-robot settings. No equations, fitting procedures, or self-citations are presented in the abstract or described claims that reduce the central results to the inputs by construction; the factorization mechanism and its benefits are treated as design choices whose effectiveness is measured externally rather than derived tautologically from the same data or prior self-referential theorems.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no identifiable free parameters, axioms, or invented entities beyond the general diffusion modeling framework; specialized sub-models are part of the proposed method rather than new postulated entities with independent evidence.

pith-pipeline@v0.9.0 · 5439 in / 1021 out tokens · 52073 ms · 2026-05-16T19:19:38.642215+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 11 internal anchors

[1]

Is Conditional Generative Modeling all you need for Decision-Making?

Anurag Ajay, Yilun Du, Abhi Gupta, Joshua Tenen- baum, Tommi Jaakkola, and Pulkit Agrawal. Is Condi- tional Generative Modeling all you need for Decision- Making?, July 2023. arXiv:2211.15657 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Com- positional foundation models for hierarchical planning, 2023

Anurag Ajay, Seungwook Han, Yilun Du, Shuang Li, Abhi Gupta, Tommi Jaakkola, Josh Tenenbaum, Leslie Kaelbling, Akash Srivastava, and Pulkit Agrawal. Com- positional foundation models for hierarchical planning, 2023

work page 2023
[3]

Modular multitask reinforcement learning with policy sketches,

Jacob Andreas, Dan Klein, and Sergey Levine. Modular multitask reinforcement learning with policy sketches,

work page
[4]

URL https://arxiv.org/abs/1611.01796

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Diffusion soup: Model merging for text-to-image diffusion mod- els, 2024

Benjamin Biggs, Arjun Seshadri, Yang Zou, Achin Jain, Aditya Golatkar, Yusheng Xie, Alessandro Achille, Ashwin Swaminathan, and Stefano Soatto. Diffusion soup: Model merging for text-to-image diffusion mod- els, 2024

work page 2024
[6]

Le, Mark Baierl, Dorothea Koert, and Jan Peters

Joao Carvalho, An T. Le, Mark Baierl, Dorothea Koert, and Jan Peters. Motion Planning Diffusion: Learning and Planning of Robot Motions with Diffusion Models, August 2023. arXiv:2308.01557 [cs]

work page arXiv 2023
[7]

Yako, Alex Gruebele, and J

Peixin Chang, Shuijing Liu, Haonan Chen, and Kather- ine Driggs-Campbell. Robot sound interpretation: Com- bining sight and sound in learning-based control. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5580–5587, 2020. doi: 10.1109/IROS45743.2020.9341196

work page doi:10.1109/iros45743.2020.9341196 2020
[8]

Multi-Modal Manipulation via Multi-Modal Policy Consensus

Haonan Chen, Jiaming Xu, Hongyu Chen, Kaiwen Hong, Binghao Huang, Chaoqi Liu, Jiayuan Mao, Yun- zhu Li, Yilun Du, and Katherine Driggs-Campbell. Multi-modal manipulation via multi-modal policy con- sensus, 2025. URL https://arxiv.org/abs/2509.23468

work page internal anchor Pith review Pith/arXiv arXiv 2025
[9]

Learning coordinated bimanual manipula- tion policies using state diffusion and inverse dynamics models

Haonan Chen, Jiaming Xu, Lily Sheng, Tianchen Ji, Shuijing Liu, Yunzhu Li, and Katherine Driggs- Campbell. Learning coordinated bimanual manipula- tion policies using state diffusion and inverse dynamics models. In2025 IEEE International Conference on Robotics and Automation (ICRA), 2025

work page 2025
[10]

Tool-as-interface: Learning robot policies from observing human tool use

Haonan Chen, Cheng Zhu, Shuijing Liu, Yunzhu Li, and Katherine Driggs-Campbell. Tool-as-interface: Learning robot policies from observing human tool use. InConference on Robot Learning (CoRL), 2025

work page 2025
[11]

Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 2024

Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burchfiel, Russ Tedrake, and Shu- ran Song. Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 2024

work page 2024
[12]

Learning modular neural network policies for multi-task and multi-robot transfer,

Coline Devin, Abhishek Gupta, Trevor Darrell, Pieter Abbeel, and Sergey Levine. Learning modular neural network policies for multi-task and multi-robot transfer,

work page
[13]

URL https://arxiv.org/abs/1609.07088

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Compositional gen- erative modeling: A single model is not all you need

Yilun Du and Leslie Kaelbling. Compositional gen- erative modeling: A single model is not all you need. arXiv preprint arXiv:2402.01103, 2024

work page arXiv 2024
[15]

Implicit generation and modeling with energy based models

Yilun Du and Igor Mordatch. Implicit generation and modeling with energy based models. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019

work page 2019
[16]

Tenenbaum, Dale Schuur- mans, and Pieter Abbeel

Yilun Du, Mengjiao Yang, Bo Dai, Hanjun Dai, Ofir Nachum, Joshua B. Tenenbaum, Dale Schuur- mans, and Pieter Abbeel. Learning Universal Policies via Text-Guided Video Generation, November 2023. arXiv:2302.00111 [cs]

work page arXiv 2023
[17]

Tenenbaum, Sander Dieleman, Rob Fergus, Jascha Sohl-Dickstein, Arnaud Doucet, and Will Grathwohl

Yilun Du, Conor Durkan, Robin Strudel, Joshua B. Tenenbaum, Sander Dieleman, Rob Fergus, Jascha Sohl-Dickstein, Arnaud Doucet, and Will Grathwohl. Reduce, reuse, recycle: Compositional generation with energy-based diffusion models and mcmc, 2024

work page 2024
[18]

Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition

Huy Ha, Pete Florence, and Shuran Song. Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition. InProceedings of The 7th Conference on Robot Learning, pages 3766–3777. PMLR, December

work page
[19]

Denoising diffusion probabilistic models, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models, 2020

work page 2020
[20]

Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, and David J. Fleet. Video Diffusion Models, June 2022. arXiv:2204.03458 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2022
[21]

Hybrid Diffusion for Simultaneous Symbolic and Continuous Planning

Sigmund Hennum Høeg, Aksel Vaaler, Chaoqi Liu, Olav Egeland, and Yilun Du. Hybrid diffusion for simultaneous symbolic and continuous planning, 2025. URL https://arxiv.org/abs/2509.21983

work page internal anchor Pith review Pith/arXiv arXiv 2025
[22]

Stephen James, Zicong Ma, David Rovick Arrojo, and Andrew J. Davison. Rlbench: The robot learning benchmark & learning environment, 2019

work page 2019
[23]

Planning with Diffusion for Flexible Behavior Synthesis

Michael Janner, Yilun Du, Joshua Tenenbaum, and Sergey Levine. Planning with Diffusion for Flexible Behavior Synthesis. InProceedings of the 39th Interna- tional Conference on Machine Learning, pages 9902–

work page
[24]

ISSN: 2640-3498

PMLR, June 2022. ISSN: 2640-3498

work page 2022
[25]

OpenVLA: An Open-Source Vision-Language-Action Model

Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, and Chelsea Finn. OpenVLA: An Open-Source Vision-Language- Action Model, June 2024. arXiv:2406.09246 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2024
[26]

Moma: Efficient early- fusion pre-training with mixture of modality-aware ex- perts, 2024

Xi Victoria Lin, Akshat Shrivastava, Liang Luo, Srini- vasan Iyer, Mike Lewis, Gargi Ghosh, Luke Zettle- moyer, and Armen Aghajanyan. Moma: Efficient early- fusion pre-training with mixture of modality-aware ex- perts, 2024

work page 2024
[27]

LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning

Bo Liu, Yifeng Zhu, Chongkai Gao, Yihao Feng, Qiang Liu, Yuke Zhu, and Peter Stone. Libero: Benchmarking knowledge transfer for lifelong robot learning, 2023. URL https://arxiv.org/abs/2306.03310

work page internal anchor Pith review Pith/arXiv arXiv 2023
[28]

Tenenbaum, and Antonio Torralba

Nan Liu, Yilun Du, Shuang Li, Joshua B. Tenenbaum, and Antonio Torralba. Unsupervised compositional concepts discovery with text-to-image generative mod- els, 2023

work page 2023
[29]

Tenenbaum

Nan Liu, Shuang Li, Yilun Du, Antonio Torralba, and Joshua B. Tenenbaum. Compositional visual generation with composable diffusion models, 2023

work page 2023
[30]

Im- proved denoising diffusion probabilistic models

Alexander Quinn Nichol and Prafulla Dhariwal. Im- proved denoising diffusion probabilistic models. In Marina Meila and Tong Zhang, editors,Proceedings of the 38th International Conference on Machine Learn- ing, volume 139 ofProceedings of Machine Learning Research, pages 8162–8171. PMLR, 18–24 Jul 2021

work page 2021
[31]

Learning Transferable Visual Models From Natural Language Supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision, 2021. URL https://arxiv.org/abs/2103.00020

work page internal anchor Pith review Pith/arXiv arXiv 2021
[32]

Hierarchical text-conditional image generation with clip latents, 2022

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents, 2022

work page 2022
[33]

Goal-Conditioned Imitation Learning using Score-based Diffusion Policies

Moritz Reuss, Maximilian Li, Xiaogang Jia, and Rudolf Lioutikov. Goal-Conditioned Imitation Learning using Score-based Diffusion Policies. InRobotics: Science and Systems XIX. Robotics: Science and Systems Foun- dation, July 2023. ISBN 978-0-9923747-9-2. doi: 10.15607/RSS.2023.XIX.028

work page doi:10.15607/rss.2023.xix.028 2023
[34]

Efficient diffusion transformer poli- cies with mixture of expert denoisers for multitask learning, 2024

Moritz Reuss, Jyothish Pari, Pulkit Agrawal, and Rudolf Lioutikov. Efficient diffusion transformer poli- cies with mixture of expert denoisers for multitask learning, 2024

work page 2024
[35]

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, January

Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, January

work page
[36]

arXiv:1701.06538 [cs]

work page internal anchor Pith review Pith/arXiv arXiv
[37]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models, 2022. URL https: //arxiv.org/abs/2010.02502

work page internal anchor Pith review Pith/arXiv arXiv 2022
[38]

Tenen- baum, and Yilun Du

Jocelin Su, Nan Liu, Yanbo Wang, Joshua B. Tenen- baum, and Yilun Du. Compositional image decompo- sition with diffusion models, 2024

work page 2024
[39]

Octo: An open-source gener- alist robot policy, 2024

Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, Jianlan Luo, You Liang Tan, Lawrence Yunliang Chen, Pannag San- keti, Quan Vuong, Ted Xiao, Dorsa Sadigh, Chelsea Finn, and Sergey Levine. Octo: An open-source gener- alist robot policy, 2024

work page 2024
[40]

SE(3)-DiffusionFields: Learning smooth cost functions for joint grasp and motion optimization through diffusion, June 2023

Julen Urain, Niklas Funk, Jan Peters, and Georgia Chalvatzaki. SE(3)-DiffusionFields: Learning smooth cost functions for joint grasp and motion optimization through diffusion, June 2023. arXiv:2209.03855 [cs]

work page arXiv 2023
[41]

A connection between score matching and denoising autoencoders.Neural computation, 23 (7):1661–1674, 2011

Pascal Vincent. A connection between score matching and denoising autoencoders.Neural computation, 23 (7):1661–1674, 2011

work page 2011
[42]

Adel- son, and Russ Tedrake

Lirui Wang, Jialiang Zhao, Yilun Du, Edward H. Adel- son, and Russ Tedrake. Poco: Policy composition from and for heterogeneous robot learning, 2024

work page 2024
[43]

Learning real-world action-video dynamics with het- erogeneous masked autoregression, 2025

Lirui Wang, Kevin Zhao, Chaoqi Liu, and Xinlei Chen. Learning real-world action-video dynamics with het- erogeneous masked autoregression, 2025. URL https: //arxiv.org/abs/2502.04296

work page arXiv 2025
[44]

Sparse diffusion policy: A sparse, reusable, and flexible policy for robot learning, 2024

Yixiao Wang, Yifei Zhang, Mingxiao Huo, Ran Tian, Xiang Zhang, Yichen Xie, Chenfeng Xu, Pengliang Ji, Wei Zhan, Mingyu Ding, and Masayoshi Tomizuka. Sparse diffusion policy: A sparse, reusable, and flexible policy for robot learning, 2024

work page 2024
[45]

Bayesian learning via stochastic gradient langevin dynamics

Max Welling and Yee W Teh. Bayesian learning via stochastic gradient langevin dynamics. InProceedings of the 28th international conference on machine learn- ing (ICML-11), pages 681–688. Citeseer, 2011

work page 2011
[46]

Multi-expert learning of adaptive legged locomotion.Science Robotics, 5(49):eabb2174, 2020

Chuanyu Yang, Kai Yuan, Qiuguo Zhu, Wanming Yu, and Zhibin Li. Multi-expert learning of adaptive legged locomotion.Science Robotics, 5(49):eabb2174, 2020

work page 2020
[47]

Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling

Zhutian Yang, Jiayuan Mao, Yilun Du, Jiajun Wu, Joshua B. Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling. Compositional diffusion-based continuous constraint solvers, 2023

work page 2023
[48]

Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning, 2021

Tianhe Yu, Deirdre Quillen, Zhanpeng He, Ryan Julian, Avnish Narayan, Hayden Shively, Adithya Bellathur, Karol Hausman, Chelsea Finn, and Sergey Levine. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning, 2021

work page 2021
[49]

Variational distillation of diffusion policies into mixture of experts, 2024

Hongyi Zhou, Denis Blessing, Ge Li, Onur Celik, Xiaogang Jia, Gerhard Neumann, and Rudolf Lioutikov. Variational distillation of diffusion policies into mixture of experts, 2024. URL https://arxiv.org/abs/2406.12538

work page arXiv 2024