Action-to-Action Flow Matching

Gen Li; Jianfei Yang; Jindou Jia; Jingliang Li; Tuo An; Xiangyu Chen; Xinying Guo; Yuxuan Hu

arxiv: 2602.07322 · v2 · submitted 2026-02-07 · 💻 cs.RO · cs.AI

Action-to-Action Flow Matching

Jindou Jia , Gen Li , Xiangyu Chen , Tuo An , Yuxuan Hu , Jingliang Li , Xinying Guo , Jianfei Yang This is my paper

Pith reviewed 2026-05-16 06:50 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords flow matchingaction generationrobotics policiesproprioceptive feedbackdiffusion modelsreal-time controlgeneralization

0 comments

The pith

Flow matching generates robot actions from prior states in one step

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes replacing random noise sampling in diffusion-based robot policies with initialization from the robot's previous actions. By embedding historical proprioceptive sequences into a latent space, the method uses this as the starting point for flow matching. This approach aims to capture physical dynamics and temporal continuity directly. As a result, it enables fast inference with as few as one step while improving robustness and generalization. The design challenges the standard practice of starting from uninformed noise.

Core claim

By initializing the flow matching process with embedded historical proprioceptive sequences rather than random Gaussian noise, Action-to-Action flow matching (A2A) produces clean actions in a single step and better captures the robot's physical dynamics and temporal continuity.

What carries the argument

Action-to-Action flow matching (A2A), a policy that embeds historical proprioceptive action sequences into a high-dimensional latent space to serve as the informed starting point for flow-based action generation.

If this is right

High-quality actions can be generated with minimal inference latency suitable for real-time control.
Improved robustness to visual perturbations compared to standard methods.
Enhanced generalization to unseen robot configurations.
Versatility shown by extension to video generation tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach may reduce the need for multiple denoising steps in other sequential prediction tasks beyond robotics.
Integrating proprioceptive history could lead to more stable policies in dynamic environments where visual input is unreliable.
Single-step generation might enable higher control frequencies in hardware-limited settings.

Load-bearing premise

Embedding historical proprioceptive sequences into a high-dimensional latent space provides an effective starting point that captures physical dynamics and temporal continuity without iterative denoising.

What would settle it

A direct comparison showing that multi-step denoising from random noise outperforms single-step A2A on standard robotic benchmarks would falsify the claim of superior performance.

read the original abstract

Diffusion-based policies have recently achieved remarkable success in robotics by formulating action prediction as a conditional denoising process. However, the standard practice of sampling from random Gaussian noise often requires multiple iterative steps to produce clean actions, leading to high inference latency that incurs a major bottleneck for real-time control. In this paper, we challenge the necessity of uninformed noise sampling and propose Action-to-Action flow matching (A2A), a novel policy paradigm that shifts from random sampling to initialization informed by the previous proprioceptive action. Unlike existing methods that treat proprioceptive action feedback as static conditions, A2A leverages historical proprioceptive sequences, embedding them into a high-dimensional latent space as the starting point for action generation. This design bypasses costly iterative denoising while effectively capturing the robot's physical dynamics and temporal continuity. Extensive experiments demonstrate that A2A exhibits high training efficiency, fast inference speed, and improved generalization. Notably, A2A enables high-quality action generation in as few as a single inference step, and exhibits superior robustness to visual perturbations and enhanced generalization to unseen configurations. Lastly, we also extend A2A to video generation, demonstrating its broader versatility in temporal modeling. Project site: https://lorenzo-0-0.github.io/A2A_Flow_Matching.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A2A replaces random noise with prior-action embeddings in flow matching to target one-step inference, but the single-step claim rests on an untested assumption about how close that start lands.

read the letter

The main point is that this paper takes flow matching for robot policies and initializes the flow from an embedding of recent proprioceptive actions instead of Gaussian noise. The goal is to reach usable actions in one step rather than iterating, which would cut latency in real-time control. They also extend the same setup to video generation as a side test of temporal modeling. That initialization choice is the concrete shift from standard practice, where past actions usually appear only as conditioning inputs. If the embedding really encodes enough dynamics and continuity, the idea is practical and directly attacks a deployment bottleneck. The motivation lines up with known issues in diffusion-style policies, and the claim of better robustness to visual changes plus generalization to new setups is worth checking against their runs. The soft spot is the lack of visible numbers or ablations in the write-up so far. The central assumption—that the latent start is close enough in the learned flow metric for a single Euler step to land in the target distribution—needs direct evidence. If actions have discontinuities or the embedding is just a non-recurrent projection, performance would fall back to something closer to an uninformed start. I'd want to see the exact architecture for the embedding, the number of steps used in baselines, and an ablation that turns the history embedding on and off. Without those, it's difficult to separate the contribution of the flow objective from the initialization trick. This is for people already working on fast generative policies or real-time robot learning. A reader who needs lower inference cost would find the setup useful to try, provided the full experiments back the one-step result. Send it to review; the idea is concrete enough and the claims are falsifiable once the numbers and code are on the table.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Action-to-Action Flow Matching (A2A), a flow-matching policy for robotics that initializes the generative process from an embedding of historical proprioceptive action sequences rather than random Gaussian noise. This design is claimed to enable high-quality action generation in a single inference step while capturing physical dynamics and temporal continuity, yielding superior robustness to visual perturbations, enhanced generalization to unseen configurations, high training efficiency, and fast inference. The approach is also extended to video generation.

Significance. If the single-step and robustness claims hold, A2A would meaningfully reduce inference latency for generative robotic policies, addressing a key deployment bottleneck. The use of proprioceptive history as a structured starting point and the extension to video generation indicate potential for broader temporal modeling tasks. Reproducibility is supported by the linked project site.

major comments (2)

[§3 (Method) and Experiments] The central single-step claim (abstract and §3) rests on the assumption that the proprioceptive embedding produces a flow starting point close enough for one Euler step to reach the target action distribution. This implicitly requires that consecutive actions lie on approximately straight paths in the learned metric and that the embedding encodes sufficient dynamics; neither is isolated by ablations comparing the embedding against a standard flow-matching baseline with random initialization.
[Abstract and §4 (Experiments)] The abstract asserts 'extensive experiments' with performance gains, robustness, and generalization but supplies no quantitative metrics, baselines, or tables. Without these data the superiority claims cannot be evaluated against the stress-test concern that a non-recurrent embedding may yield an uninformed start.

minor comments (2)

[§3.2] Specify the exact architecture (recurrent or otherwise) and dimensionality of the proprioceptive embedding, and clarify how historical sequences are tokenized or aggregated.
[Related Work] Add explicit comparison to other few-step or single-step flow-matching or consistency-model baselines in robotics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that strengthening the isolation of the proprioceptive initialization and ensuring quantitative results are clearly presented will improve the manuscript. We address each major comment below and will incorporate revisions as noted.

read point-by-point responses

Referee: [§3 (Method) and Experiments] The central single-step claim (abstract and §3) rests on the assumption that the proprioceptive embedding produces a flow starting point close enough for one Euler step to reach the target action distribution. This implicitly requires that consecutive actions lie on approximately straight paths in the learned metric and that the embedding encodes sufficient dynamics; neither is isolated by ablations comparing the embedding against a standard flow-matching baseline with random initialization.

Authors: We agree that an explicit ablation isolating the effect of the proprioceptive embedding versus random initialization is necessary to rigorously support the single-step claim. In the revised manuscript we will add a new ablation in §4 that directly compares A2A (proprioceptive latent initialization) to an otherwise identical flow-matching baseline initialized from standard Gaussian noise. We will also add a short discussion of the learned metric and the degree to which consecutive actions follow approximately straight trajectories under the trained vector field. revision: yes
Referee: [Abstract and §4 (Experiments)] The abstract asserts 'extensive experiments' with performance gains, robustness, and generalization but supplies no quantitative metrics, baselines, or tables. Without these data the superiority claims cannot be evaluated against the stress-test concern that a non-recurrent embedding may yield an uninformed start.

Authors: We apologize for any lack of clarity in the reviewed version. Section 4 of the full manuscript already contains quantitative tables and figures comparing A2A against diffusion-policy and flow-matching baselines on success rate, inference latency, robustness to visual perturbations, and generalization to unseen configurations. In the revision we will (i) update the abstract to cite specific metrics (e.g., “single-step inference with 15% higher success rate and 8× lower latency”), (ii) ensure all tables are referenced from the abstract and introduction, and (iii) incorporate the new random-initialization ablation to directly address the concern that the embedding could be uninformed. revision: yes

Circularity Check

0 steps flagged

No significant circularity; single-step claim presented as empirical outcome of proprioceptive initialization in flow matching

full rationale

The provided abstract and description introduce A2A by replacing random Gaussian noise with an embedding of historical proprioceptive sequences as the flow starting point. No equations appear that reduce the single-inference-step result to a fitted parameter renamed as prediction or to a self-referential definition. The claim that the embedding captures physical dynamics and temporal continuity is stated as a design choice whose effectiveness is asserted via experiments, not derived tautologically from the method's own inputs. No self-citations, uniqueness theorems, or ansatzes smuggled via prior work are invoked in the text to force the architecture or outcome. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The proposal rests on the assumption that flow matching can operate effectively from an informed latent initialization derived from proprioceptive history; no free parameters or new entities are named in the abstract.

axioms (1)

domain assumption Proprioceptive action sequences can be embedded into a high-dimensional latent space that serves as a sufficient starting point for flow matching.
This is the central design choice stated in the abstract.

pith-pipeline@v0.9.0 · 5540 in / 1098 out tokens · 34809 ms · 2026-05-16T06:50:29.779040+00:00 · methodology

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Feedback World Model Enables Precise Guidance of Diffusion Policy
cs.RO 2026-05 unverdicted novelty 6.0

Feedback world model closes the prediction-observation loop at inference time to correct errors and improve diffusion policy performance under distribution shift in robotics.
FLASH: Efficient Visuomotor Policy via Sparse Sampling
cs.RO 2026-05 unverdicted novelty 6.0

FLASH Policy uses sparse Legendre polynomial trajectory fitting and history-anchored flow matching to enable single-step inference for visuomotor control, reporting 31.4 ms per-episode latency and >=92% success on fiv...
WarmPrior: Straightening Flow-Matching Policies with Temporal Priors
cs.LG 2026-05 unverdicted novelty 6.0

Replacing Gaussian noise with a temporally grounded prior from recent actions straightens flow-matching paths and improves success rates in robotic manipulation and prior-space RL.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · cited by 3 Pith papers · 10 internal anchors

[1]

A noise is worth diffusion guidance.arXiv preprint arXiv:2412.03895, 2024

Donghoon Ahn, Jiwon Kang, Sanghyun Lee, Jaewon Min, Minjae Kim, Wooseok Jang, Hyoungwon Cho, Sayak Paul, SeonHwa Kim, Eunju Cha, et al. A noise is worth diffusion guidance.arXiv preprint arXiv:2412.03895,

work page arXiv
[2]

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

Johan Bjorck, Fernando Castañeda, Nikita Cherniadev, Xingye Da, Runyu Ding, Linxi Fan, Yu Fang, Dieter Fox, Fengyuan Hu, Spencer Huang, et al. Gr00t n1: An open foundation model for generalist humanoid robots.arXiv preprint arXiv:2503.14734,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al.π0: A vision-language-action flow model for general robot control.arXiv preprint arXiv:2410.24164,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram Voleti, Adam Letts, et al. Stable video diffusion: Scaling latent video diffusion models to large datasets.arXiv preprint arXiv:2311.15127,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Rethinking video generation model for the embodied world.arXiv preprint arXiv:2601.15282, 2026

Yufan Deng, Zilin Pan, Hongyu Zhang, Xiaojie Li, Ruoqing Hu, Yufei Ding, Yiming Zou, Yan Zeng, and Daquan Zhou. Rethinking video generation model for the embodied world.arXiv preprint arXiv:2601.15282,

work page arXiv
[6]

Vita: Vision-to-action flow matching policy, 2026

Dechen Gao, Boqi Zhao, Andrew Lee, Ian Chuang, Hanchu Zhou, Hang Wang, Zhe Zhao, Junshan Zhang, and Iman Soltani. VITA: Vision-to-action flow matching policy.arXiv preprint arXiv:2507.13231,

work page arXiv
[7]

Roboverse: Towards a unified platform, dataset and benchmark for scalable and generalizable robot learning.arXiv preprint arXiv:2504.18904, 2025

Haoran Geng, Feishi Wang, Songlin Wei, Yuyang Li, Bangjun Wang, Boshi An, Charlie Tianyue Cheng, Haozhe Lou, Peihao Li, Yen-Jen Wang, et al. Roboverse: Towards a unified platform, dataset and benchmark for scalable and generalizable robot learning.arXiv preprint arXiv:2504.18904, 2025a. Zhengyang Geng, Mingyang Deng, Xingjian Bai, J Zico Kolter, and Kaimi...

work page arXiv
[8]

$\pi^{*}_{0.6}$: a VLA That Learns From Experience

Physical Intelligence, Ali Amin, Raichelle Aniceto, Ashwin Balakrishna, Kevin Black, Ken Conley, Grace Connors, James Darpinian, Karan Dhabalia, Jared DiCarlo, et al.π∗ 0.6: A VLA that learns from experience.arXiv preprint arXiv:2511.14759, 2025a. Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, ...

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Causal World Modeling for Robot Control

Lin Li, Qihang Zhang, Yiming Luo, Shuai Yang, Ruilin Wang, Fei Han, Mingrui Yu, Zelin Gao, Nan Xue, Xing Zhu, et al. Causal world modeling for robot control.arXiv preprint arXiv:2601.21998,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Flow Matching for Generative Modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747,

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Rectified Flow: A Marginal Preserving Approach to Optimal Transport

Qiang Liu. Rectified flow: A marginal preserving approach to optimal transport.arXiv preprint arXiv:2209.14577,

work page internal anchor Pith review arXiv
[12]

One-step Latent-free Image Generation with Pixel Mean Flows

Yiyang Lu, Susie Lu, Qiao Sun, Hanhong Zhao, Zhicheng Jiang, Xianbang Wang, Tianhong Li, Zhengyang Geng, and Kaiming He. One-step latent-free image generation with pixel mean flows.arXiv preprint arXiv:2601.22158,

work page internal anchor Pith review Pith/arXiv arXiv
[13]

Maniskill: Generalizable manipulation skill benchmark with large-scale demonstrations,

Tongzhou Mu, Zhan Ling, Fanbo Xiang, Derek Yang, Xuanlin Li, Stone Tao, Zhiao Huang, Zhiwei Jia, and Hao Su. ManiSkill: Generalizable manipulation skill benchmark with large-scale demonstrations.arXiv preprint arXiv:2107.14483,

work page arXiv
[14]

Much ado about noising: Dispelling the myths of gener- ative robotic control.arXiv preprint arXiv:2512.01809, 2025

Chaoyi Pan, Giri Anantharaman, Nai-Chieh Huang, Claire Jin, Daniel Pfrommer, Chenyang Yuan, Frank Permenter, Guannan Qu, Nicholas Boffi, Guanya Shi, et al. Much ado about noising: Dispelling the myths of generative robotic control.arXiv preprint arXiv:2512.01809,

work page arXiv
[15]

Warm starts accelerate conditional diffusion.arXiv preprint arXiv:2507.09212,

Jonas Scholz and Richard E Turner. Warm starts accelerate conditional diffusion.arXiv preprint arXiv:2507.09212,

work page arXiv
[16]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020a. Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456, 2020b. Ashish Vaswani, Noam ...

work page internal anchor Pith review Pith/arXiv arXiv 2010
[17]

Steering Your Diffusion Policy with Latent Space Reinforcement Learning

Andrew Wagenmaker, Mitsuhiko Nakamoto, Yunchu Zhang, Seohong Park, Waleed Yagoub, Anusha Nagabandi, Abhishek Gupta, and Sergey Levine. Steering your diffusion policy with latent space reinforcement learning.arXiv preprint arXiv:2506.15799,

work page internal anchor Pith review arXiv
[18]

Dreamvla: a vision-language-action model dreamed with comprehen- sive world knowledge

Juntu Zhao, Wenbo Lu, Di Zhang, Yufeng Liu, Yushen Liang, Tianluo Zhang, Yifeng Cao, Junyuan Xie, Yingdong Hu, Shengjie Wang, et al. Do you need proprioceptive states in visuomotor policies?arXiv preprint arXiv:2509.18644, 2025a. 13 Qingqing Zhao, Yao Lu, Moo Jin Kim, Zipeng Fu, Zhuoyang Zhang, Yecheng Wu, Zhaoshuo Li, Qianli Ma, Song Han, Chelsea Finn, e...

work page arXiv

[1] [1]

A noise is worth diffusion guidance.arXiv preprint arXiv:2412.03895, 2024

Donghoon Ahn, Jiwon Kang, Sanghyun Lee, Jaewon Min, Minjae Kim, Wooseok Jang, Hyoungwon Cho, Sayak Paul, SeonHwa Kim, Eunju Cha, et al. A noise is worth diffusion guidance.arXiv preprint arXiv:2412.03895,

work page arXiv

[2] [2]

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

Johan Bjorck, Fernando Castañeda, Nikita Cherniadev, Xingye Da, Runyu Ding, Linxi Fan, Yu Fang, Dieter Fox, Fengyuan Hu, Spencer Huang, et al. Gr00t n1: An open foundation model for generalist humanoid robots.arXiv preprint arXiv:2503.14734,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al.π0: A vision-language-action flow model for general robot control.arXiv preprint arXiv:2410.24164,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram Voleti, Adam Letts, et al. Stable video diffusion: Scaling latent video diffusion models to large datasets.arXiv preprint arXiv:2311.15127,

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

Rethinking video generation model for the embodied world.arXiv preprint arXiv:2601.15282, 2026

Yufan Deng, Zilin Pan, Hongyu Zhang, Xiaojie Li, Ruoqing Hu, Yufei Ding, Yiming Zou, Yan Zeng, and Daquan Zhou. Rethinking video generation model for the embodied world.arXiv preprint arXiv:2601.15282,

work page arXiv

[6] [6]

Vita: Vision-to-action flow matching policy, 2026

Dechen Gao, Boqi Zhao, Andrew Lee, Ian Chuang, Hanchu Zhou, Hang Wang, Zhe Zhao, Junshan Zhang, and Iman Soltani. VITA: Vision-to-action flow matching policy.arXiv preprint arXiv:2507.13231,

work page arXiv

[7] [7]

Roboverse: Towards a unified platform, dataset and benchmark for scalable and generalizable robot learning.arXiv preprint arXiv:2504.18904, 2025

Haoran Geng, Feishi Wang, Songlin Wei, Yuyang Li, Bangjun Wang, Boshi An, Charlie Tianyue Cheng, Haozhe Lou, Peihao Li, Yen-Jen Wang, et al. Roboverse: Towards a unified platform, dataset and benchmark for scalable and generalizable robot learning.arXiv preprint arXiv:2504.18904, 2025a. Zhengyang Geng, Mingyang Deng, Xingjian Bai, J Zico Kolter, and Kaimi...

work page arXiv

[8] [8]

$\pi^{*}_{0.6}$: a VLA That Learns From Experience

Physical Intelligence, Ali Amin, Raichelle Aniceto, Ashwin Balakrishna, Kevin Black, Ken Conley, Grace Connors, James Darpinian, Karan Dhabalia, Jared DiCarlo, et al.π∗ 0.6: A VLA that learns from experience.arXiv preprint arXiv:2511.14759, 2025a. Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, ...

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

Causal World Modeling for Robot Control

Lin Li, Qihang Zhang, Yiming Luo, Shuai Yang, Ruilin Wang, Fei Han, Mingrui Yu, Zelin Gao, Nan Xue, Xing Zhu, et al. Causal world modeling for robot control.arXiv preprint arXiv:2601.21998,

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

Flow Matching for Generative Modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747,

work page internal anchor Pith review Pith/arXiv arXiv

[11] [11]

Rectified Flow: A Marginal Preserving Approach to Optimal Transport

Qiang Liu. Rectified flow: A marginal preserving approach to optimal transport.arXiv preprint arXiv:2209.14577,

work page internal anchor Pith review arXiv

[12] [12]

One-step Latent-free Image Generation with Pixel Mean Flows

Yiyang Lu, Susie Lu, Qiao Sun, Hanhong Zhao, Zhicheng Jiang, Xianbang Wang, Tianhong Li, Zhengyang Geng, and Kaiming He. One-step latent-free image generation with pixel mean flows.arXiv preprint arXiv:2601.22158,

work page internal anchor Pith review Pith/arXiv arXiv

[13] [13]

Maniskill: Generalizable manipulation skill benchmark with large-scale demonstrations,

Tongzhou Mu, Zhan Ling, Fanbo Xiang, Derek Yang, Xuanlin Li, Stone Tao, Zhiao Huang, Zhiwei Jia, and Hao Su. ManiSkill: Generalizable manipulation skill benchmark with large-scale demonstrations.arXiv preprint arXiv:2107.14483,

work page arXiv

[14] [14]

Much ado about noising: Dispelling the myths of gener- ative robotic control.arXiv preprint arXiv:2512.01809, 2025

Chaoyi Pan, Giri Anantharaman, Nai-Chieh Huang, Claire Jin, Daniel Pfrommer, Chenyang Yuan, Frank Permenter, Guannan Qu, Nicholas Boffi, Guanya Shi, et al. Much ado about noising: Dispelling the myths of generative robotic control.arXiv preprint arXiv:2512.01809,

work page arXiv

[15] [15]

Warm starts accelerate conditional diffusion.arXiv preprint arXiv:2507.09212,

Jonas Scholz and Richard E Turner. Warm starts accelerate conditional diffusion.arXiv preprint arXiv:2507.09212,

work page arXiv

[16] [16]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020a. Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456, 2020b. Ashish Vaswani, Noam ...

work page internal anchor Pith review Pith/arXiv arXiv 2010

[17] [17]

Steering Your Diffusion Policy with Latent Space Reinforcement Learning

Andrew Wagenmaker, Mitsuhiko Nakamoto, Yunchu Zhang, Seohong Park, Waleed Yagoub, Anusha Nagabandi, Abhishek Gupta, and Sergey Levine. Steering your diffusion policy with latent space reinforcement learning.arXiv preprint arXiv:2506.15799,

work page internal anchor Pith review arXiv

[18] [18]

Dreamvla: a vision-language-action model dreamed with comprehen- sive world knowledge

Juntu Zhao, Wenbo Lu, Di Zhang, Yufeng Liu, Yushen Liang, Tianluo Zhang, Yifeng Cao, Junyuan Xie, Yingdong Hu, Shengjie Wang, et al. Do you need proprioceptive states in visuomotor policies?arXiv preprint arXiv:2509.18644, 2025a. 13 Qingqing Zhao, Yao Lu, Moo Jin Kim, Zipeng Fu, Zhuoyang Zhang, Yecheng Wu, Zhaoshuo Li, Qianli Ma, Song Han, Chelsea Finn, e...

work page arXiv