BayesFP: Posterior Estimation for Flow-Based Policies via Feynman-Kac Sampling

Fabio Ramos; Sreevardhan Sirigiri; Weiming Zhi

arxiv: 2606.21014 · v1 · pith:UE4JHSZFnew · submitted 2026-06-19 · 💻 cs.RO

BayesFP: Posterior Estimation for Flow-Based Policies via Feynman-Kac Sampling

Sreevardhan Sirigiri , Weiming Zhi , Fabio Ramos This is my paper

Pith reviewed 2026-06-26 14:38 UTC · model grok-4.3

classification 💻 cs.RO

keywords Bayesian posterior samplingFeynman-Kac samplingflow-matching policiesdiffusion policiestrajectory generationconstrained generationrobot manipulationinference-time adaptation

0 comments

The pith

Constrained trajectory generation for pretrained diffusion and flow-matching policies reduces to Bayesian posterior sampling solved at inference time via an extended Feynman-Kac framework.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper treats the problem of making pretrained robot policies obey new safety constraints and task goals as a Bayesian update: the policy's learned distribution over demonstrations serves as the prior, while an inference-time cost function supplies the likelihood that tilts samples toward feasible trajectories. Sampling from the resulting posterior is performed by adapting the Feynman-Kac corrector, originally developed for diffusion models, so that it also works for deterministic flow-matching policies. Because the method requires no retraining or architectural changes to the base policy, it supplies a single inference-time procedure usable across both families of generative policies. A reader would care because it turns existing checkpoints into flexible planners that can handle obstacles or objectives introduced only after training.

Core claim

Constrained trajectory generation for pretrained diffusion and flow-matching policies can be formulated as Bayesian posterior sampling in which the learned demonstration distribution is the prior and an inference-time, cost-derived likelihood tilts the distribution toward feasible and optimal trajectories; the Feynman-Kac corrector framework, extended to deterministic flow-matching policies, then yields a unified, retraining-free sampler that draws from this posterior.

What carries the argument

The Feynman-Kac corrector framework extended from diffusion models to deterministic flow-matching policies, which performs the posterior sampling step.

If this is right

The same sampler works without modification for both diffusion policies and flow-matching policies.
Constraints such as non-convex obstacle avoidance can be introduced only at inference time and still produce valid trajectories.
Performance improves over the base policy on zero-shot manipulation tasks in both simulation and real-world settings.
No retraining or fine-tuning of the original checkpoints is required.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be combined with online replanning loops to handle time-varying constraints.
If the cost function is differentiable, gradients of the likelihood could be used to further accelerate sampling.
The formulation suggests that similar posterior-tilting techniques might apply to other autoregressive or energy-based policies.

Load-bearing premise

The Feynman-Kac corrector framework can be extended to deterministic flow-matching policies in a way that correctly samples from the defined posterior without any retraining of the base policy.

What would settle it

Running the sampler on a low-dimensional toy problem where the exact posterior can be computed by rejection sampling or MCMC and observing that the generated trajectories do not match the exact posterior distribution in support or moments would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.21014 by Fabio Ramos, Sreevardhan Sirigiri, Weiming Zhi.

**Figure 1.** Figure 1: Toy 2D environments with obstacle constraints. Top: the diffusion model is trained on demonstrations that avoid the blue circular regions, while the red obstacle constraints are introduced only at inference time. We compare no guidance, a naive linear-combination baseline that replaces the learned score ∇ log pt(x) by ∇ log pt(x) + λt∇J (x), and our BayesFP (Bayes Flow-based Policies). Bottom: the model i… view at source ↗

**Figure 2.** Figure 2: Scene visualization of the Robomimic and LIBERO tasks. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Scene visualization of the Robolab tasks. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Scene visualization of the real-world tasks [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Ablations on LIBERO-Object with a cylindrical obstacle (radius [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Robots must generate trajectories that remain faithful to learned expert behavior while satisfying safety constraints and task-specific objectives specified only at inference time. We formulate constrained trajectory generation for pretrained diffusion and flow-matching policies as Bayesian posterior sampling, with the learned demonstration distribution as a prior and an inference-time, cost-derived likelihood tilting it toward feasible, optimal trajectories. To sample from this posterior without any retraining of the base policy, we leverage the Feynman--Kac corrector framework, originally formulated for diffusion models, and extend it to deterministic flow-matching policies. The result is a unified, inference-time, retraining-free sampler for diffusion and flow policies. We validate the approach on pretrained Diffusion Policy, GR00T-N1.6, and $\pi_{0.5}$ checkpoints across simulated and real-world manipulation tasks, including planning around non-convex obstacles introduced at inference time, and show improvements over the base $\pi_{0.5}$ on zero-shot tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper extends Feynman-Kac sampling to flow-matching policies for inference-time posterior sampling of constrained robot trajectories, but the deterministic extension is the part that needs the most scrutiny.

read the letter

The core move here is to treat safety constraints and task costs as an inference-time likelihood that tilts a pretrained diffusion or flow policy toward feasible trajectories, then sample from the resulting posterior using an adapted Feynman-Kac corrector. This keeps the base policy frozen.

What is actually new is the claimed extension of the corrector construction from stochastic diffusion SDEs to deterministic flow-matching ODEs, producing a single sampler that works for both families. The validation uses real checkpoints (Diffusion Policy, GR00T-N1.6, π0.5) on manipulation tasks with non-convex obstacles added only at test time, and reports gains over the base π0.5.

The practical upside is clear for robotics work that wants to add constraints without retraining. The experiments target exactly that setting and use multiple models, which is better than toy ablations.

The soft spot is the extension itself. The original Feynman-Kac corrector relies on the stochasticity of the diffusion process; moving it to a deterministic velocity field while preserving exact posterior measure is not automatic. The abstract asserts it works, but the stress-test concern about measure preservation is the load-bearing step. If the paper only shows empirical improvement rather than a derivation that the tilted flow exactly targets the defined posterior, the method becomes a useful heuristic rather than a guaranteed sampler. Without the full derivation and any accompanying error bounds or convergence arguments, it is hard to judge how tight the guarantee is.

This is for researchers in robot learning who already have strong base policies and need a lightweight way to enforce new constraints. It is worth sending to peer review because the problem is real, the approach is practical, and the experiments use actual deployed models. Referees can check the math on the flow extension and ask for tighter controls on the sampling accuracy.

Referee Report

2 major / 2 minor

Summary. The paper claims to formulate constrained trajectory generation for pretrained diffusion and flow-matching policies as Bayesian posterior sampling, with the learned demonstration distribution as prior and an inference-time cost-derived likelihood. It extends the Feynman-Kac corrector framework (originally for diffusion SDEs) to deterministic flow-matching ODEs to enable exact sampling from the posterior without retraining the base policy. The result is presented as a unified inference-time sampler, validated on Diffusion Policy, GR00T-N1.6, and π_{0.5} checkpoints across simulated and real-world manipulation tasks with non-convex obstacles introduced at inference time.

Significance. If the extension to flow-matching policies is shown to yield exact posterior samples, the work offers a retraining-free approach to incorporating dynamic constraints into flow-based robot policies. This could have practical significance for robotics, where safety and task objectives often arise only at inference time, and would unify handling of both stochastic diffusion and deterministic flow policies under one framework.

major comments (2)

[§3 (Feynman-Kac extension to flow-matching)] The extension of the Feynman-Kac corrector to deterministic flow-matching policies is load-bearing for the central claim of exact posterior sampling. The abstract asserts that the framework extends such that the likelihood tilts the velocity field to sample from p(trajectory | constraints), but the original construction relies on stochasticity for the corrector; the methods derivation must explicitly show how the deterministic ODE preserves the posterior measure (e.g., via the appropriate likelihood gradient incorporation) or the generated trajectories are not guaranteed to be exact samples.
[§5 (Experiments)] Table 2 (or equivalent results table): the reported improvements over base π_{0.5} on zero-shot tasks support practicality but do not directly test whether samples are drawn from the defined posterior versus an approximation; without metrics such as constraint violation rates under the exact posterior or comparison to rejection sampling baselines, the claim of exact sampling remains unverified.

minor comments (2)

[§2] Notation for the likelihood function and cost-derived tilting term should be introduced with an explicit equation early in the methods to avoid ambiguity when reading the extension claim.
[Figure 3] Figure 3 caption: clarify whether the visualized trajectories are single samples or aggregated statistics, as this affects interpretation of constraint satisfaction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [§3 (Feynman-Kac extension to flow-matching)] The extension of the Feynman-Kac corrector to deterministic flow-matching policies is load-bearing for the central claim of exact posterior sampling. The abstract asserts that the framework extends such that the likelihood tilts the velocity field to sample from p(trajectory | constraints), but the original construction relies on stochasticity for the corrector; the methods derivation must explicitly show how the deterministic ODE preserves the posterior measure (e.g., via the appropriate likelihood gradient incorporation) or the generated trajectories are not guaranteed to be exact samples.

Authors: We appreciate the referee's emphasis on the need for an explicit demonstration of measure preservation in the deterministic case. Section 3 derives the extension by showing that the cost-derived likelihood is incorporated as a multiplicative tilt on the flow velocity field; because the flow ODE defines a deterministic transport map, this tilt yields the desired posterior measure without requiring additional stochasticity. To strengthen clarity, we will expand the derivation with an explicit proof sketch of measure invariance under the modified velocity field. revision: yes
Referee: [§5 (Experiments)] Table 2 (or equivalent results table): the reported improvements over base π_{0.5} on zero-shot tasks support practicality but do not directly test whether samples are drawn from the defined posterior versus an approximation; without metrics such as constraint violation rates under the exact posterior or comparison to rejection sampling baselines, the claim of exact sampling remains unverified.

Authors: We agree that the current experiments emphasize task-level performance rather than direct verification of exact posterior sampling. We will add, in the revised manuscript, a controlled comparison against rejection sampling on a low-dimensional synthetic task (where exact sampling is tractable) together with quantitative constraint-violation statistics to provide empirical support for the exactness claim. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation builds on external Feynman-Kac framework

full rationale

The paper formulates posterior sampling by extending the Feynman-Kac corrector (originally for diffusion SDEs) to flow-matching ODEs, then validates on pretrained checkpoints without retraining. No equations or steps in the provided text reduce a claimed result to a fitted parameter, self-definition, or self-citation chain; the central extension is presented as a technical contribution leveraging an independent prior framework rather than re-deriving its own inputs. The method is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; all such elements would require the full manuscript.

pith-pipeline@v0.9.1-grok · 5698 in / 1007 out tokens · 17839 ms · 2026-06-26T14:38:10.800521+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 3 canonical work pages

[1]

Janner, Y

M. Janner, Y . Du, J. Tenenbaum, and S. Levine. Planning with diffusion for flexible behavior synthesis. InInternational Conference on Machine Learning (ICML), 2022

2022
[2]

C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song. Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 2024

2024
[3]

Lipman, R

Y . Lipman, R. T. Q. Chen, H. Ben-Hamu, M. Nickel, and M. Le. Flow matching for generative modeling. InInternational Conference on Learning Representations (ICLR), 2023

2023
[4]

Ye and M

S. Ye and M. C. Gombolay. Efficient trajectory forecasting and generation with conditional flow matching. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024

2024
[5]

Jiang, X

S. Jiang, X. Fang, N. Roy, T. Lozano-Pérez, L. P. Kaelbling, and S. Ancha. Streaming flow policy: Simplifying diffusion / flow-matching policies by treating action trajectories as flow trajectories. InConference on Robot Learning (CoRL), 2025

2025
[6]

Ratliff, M

N. Ratliff, M. Zucker, J. A. Bagnell, and S. Srinivasa. Chomp: Gradient optimization techniques for efficient motion planning. InIEEE International Conference on Robotics and Automation (ICRA). IEEE, 2009

2009
[7]

Schulman, J

J. Schulman, J. Ho, A. X. Lee, I. Awwal, H. Bradlow, and P. Abbeel. Finding locally optimal, collision-free trajectories with sequential convex optimization. InRobotics: science and systems, 2013

2013
[8]

A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada. Control barrier functions: Theory and applications. In18th European Control Conference (ECC), 2019

2019
[9]

Dhariwal and A

P. Dhariwal and A. Nichol. Diffusion models beat gans on image synthesis.Advances in Neural Information Processing Systems (NeurIPS), 2021

2021
[10]

Xiao, T.-H

W. Xiao, T.-H. Wang, C. Gan, R. Hasani, M. Lechner, and D. Rus. Safediffuser: Safe planning with diffusion probabilistic models. InThe Thirteenth International Conference on Learning Representations (ICLR), 2023. 9

2023
[11]

Römer, A

R. Römer, A. v. Rohr, and A. Schoellig. Diffusion predictive control with constraints. In Proceedings of the 7th Annual Learning for Dynamics & Control Conference, Proceedings of Machine Learning Research, 2025

2025
[12]

Lou and S

A. Lou and S. Ermon. Reflected diffusion models. InInternational Conference on Machine Learning (ICML). PMLR, 2023

2023
[13]

G.-H. Liu, T. Chen, E. Theodorou, and M. Tao. Mirror diffusion models for constrained and watermarked generation.Advances in Neural Information Processing Systems (NeurIPS), 2023

2023
[14]

W. Jung, U. A. Mishra, N. R. Arachchige, Y . Chen, D. Xu, and S. Kousik. Joint model-based model-free diffusion for planning with constraints. In9th Annual Conference on Robot Learning,
[15]

URLhttps://openreview.net/forum?id=E9t1ekt6W9
[16]

Skreta, T

M. Skreta, T. Akhound-Sadegh, V . Ohanesian, R. Bondesan, A. Aspuru-Guzik, A. Doucet, R. Brekelmans, A. Tong, and K. Neklyudov. Feynman-kac correctors in diffusion: Annealing, guidance, and product of experts. InForty-second International Conference on Machine Learning, 2025. URLhttps://openreview.net/forum?id=Vhc0KrcqWu

2025
[17]

Singh and I

S. Singh and I. Fischer. Stochastic sampling from deterministic flow models, 2024. URL https://arxiv.org/abs/2410.02217

arXiv 2024
[18]

Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations, 2021

2021
[19]

Braun, N

M. Braun, N. Jaquier, L. Rozo, and T. Asfour. Riemannian flow matching policy for robot motion learning, 2024. URLhttps://arxiv.org/abs/2403.10672

arXiv 2024
[20]

X. Liu, C. Gong, and Q. Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022

Pith/arXiv arXiv 2022
[21]

Lambert, A

A. Lambert, A. Fishman, D. Fox, B. Boots, and F. Ramos. Stein variational model predictive control, 2021. URLhttps://arxiv.org/abs/2011.07641

arXiv 2021
[22]

S. Levine. Reinforcement learning and control as probabilistic inference: Tutorial and review. CoRR, abs/1805.00909, 2018. URLhttp://arxiv.org/abs/1805.00909

Pith/arXiv arXiv 2018
[23]

Rawlik, M

K. Rawlik, M. Toussaint, and S. Vijayakumar. On stochastic optimal control and reinforcement learning by approximate inference. InTwenty-Third International Joint Conference on Artificial Intelligence, 2013

2013
[24]

Bjorck, F

NVIDIA, :, J. Bjorck, F. Castañeda, N. Cherniadev, X. Da, R. Ding, L. J. Fan, Y . Fang, D. Fox, F. Hu, S. Huang, J. Jang, Z. Jiang, J. Kautz, K. Kundalia, L. Lao, Z. Li, Z. Lin, K. Lin, G. Liu, E. Llontop, L. Magne, A. Mandlekar, A. Narayan, S. Nasiriany, S. Reed, Y . L. Tan, G. Wang, Z. Wang, J. Wang, Q. Wang, J. Xiang, Y . Xie, Y . Xu, Z. Xu, S. Ye, Z. ...

Pith/arXiv arXiv 2025
[25]

Intelligence, K

P. Intelligence, K. Black, N. Brown, J. Darpinian, K. Dhabalia, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, M. Y . Galliker, D. Ghosh, L. Groom, K. Hausman, B. Ichter, S. Jakubczak, T. Jones, L. Ke, D. LeBlanc, S. Levine, A. Li-Bell, M. Mothukuri, S. Nair, K. Pertsch, A. Z. Ren, L. X. Shi, L. Smith, J. T. Springenberg, K. Stachowicz, J. Tanner, Q. V...

Pith/arXiv arXiv 2025
[26]

Y . Song, L. Le, Y .-H. Park, J. Wang, J. Shi, L. Liu, J. Gu, E. Eaton, D. Jayaraman, and K. Daniilidis. Omniguide: Universal guidance fields for enhancing generalist robot policies,
[27]

URLhttps://arxiv.org/abs/2603.10052. 10

arXiv
[28]

Y . Wang, L. Wang, Y . Du, B. Sundaralingam, X. Yang, Y .-W. Chao, C. Perez-D’Arpino, D. Fox, and J. Shah. Inference-time policy steering through human interactions, 2025. URL https://arxiv.org/abs/2411.16627

arXiv 2025
[29]

J. Long, D. Liu, W. Cai, I. Manchester, and W. Zhi. Constraining streaming flow models for adapting learned robot trajectory distributions, 2026. URL https://arxiv.org/abs/2602. 15567

2026
[30]

Millane, H

A. Millane, H. Oleynikova, E. Wirbel, R. Steiner, V . Ramasamy, D. Tingdahl, and R. Siegwart. nvblox: Gpu-accelerated incremental signed distance field mapping, 2024. URL https: //arxiv.org/abs/2311.00626

arXiv 2024
[31]

Mandlekar, D

A. Mandlekar, D. Xu, J. Wong, S. Nasiriany, C. Wang, R. Kulkarni, L. Fei-Fei, S. Savarese, Y . Zhu, and R. Martín-Martín. What matters in learning from offline human demonstrations for robot manipulation. In5th Annual Conference on Robot Learning, 2021. URL https: //openreview.net/forum?id=JrsfBJtDFdI

2021
[32]

B. Liu, Y . Zhu, C. Gao, Y . Feng, Q. Liu, Y . Zhu, and P. Stone. Libero: Benchmarking knowledge transfer for lifelong robot learning, 2023. URLhttps://arxiv.org/abs/2306.03310

Pith/arXiv arXiv 2023
[33]

Khazatsky, K

A. Khazatsky, K. Pertsch, S. Nair, A. Balakrishna, S. Dasari, S. Karamcheti, S. Nasiriany, M. K. Srirama, L. Y . Chen, K. Ellis, P. D. Fagan, J. Hejna, M. Itkina, M. Lepert, Y . J. Ma, P. T. Miller, J. Wu, S. Belkhale, S. Dass, H. Ha, A. Jain, A. Lee, Y . Lee, M. Memmel, S. Park, I. Radosavovic, K. Wang, A. Zhan, K. Black, C. Chi, K. B. Hatch, S. Lin, J. ...

Pith/arXiv arXiv 2025
[34]

X. Yang, R. Dagli, A. Zook, H. Hadfield, A. Goyal, S. Birchfield, F. Ramos, and J. Tremblay. Robolab: A high-fidelity simulation benchmark for analysis of task generalist policies, 2026. URLhttps://arxiv.org/abs/2604.09860

Pith/arXiv arXiv 2026
[35]

TheRobotStudio and H. Face. Standard Open SO-100 & SO-101 Arms. https://github. com/TheRobotStudio/SO-ARM100
[36]

S. H. Høeg, Y . Du, and O. Egeland. Fast policy synthesis with variable noise diffusion models. InIEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025

2025
[37]

Mishra and I

R. Mishra and I. R. Manchester. Eb-mbd: Emerging-barrier model-based diffusion for safe trajectory optimization in highly constrained environments. InIEEE International Conference on Robotics and Automation (ICRA). IEEE, 2026

2026
[38]

Uehara, Y

M. Uehara, Y . Zhao, T. Biancalani, and S. Levine. Understanding reinforcement learning-based fine-tuning of diffusion models: A tutorial and review.arXiv preprint arXiv:2407.13734, 2024

arXiv 2024
[39]

L. Wu, B. Trippe, C. Naesseth, D. Blei, and J. P. Cunningham. Practical and asymptotically exact conditional sampling in diffusion models. InAdvances in Neural Information Processing Systems, 2024

2024
[40]

G. V . Cardoso, Y . J. El Idrissi, S. Le Corff, and E. Moulines. Monte Carlo guided diffusion for Bayesian linear inverse problems. InInternational Conference on Learning Representations, 2024. 11

2024
[41]

Singhal, Z

R. Singhal, Z. Horvitz, R. Teehan, M. Ren, Z. Yu, K. McKeown, and R. Ranganath. A general framework for inference-time scaling and steering of diffusion models, 2025. URL https://arxiv.org/abs/2501.06848

arXiv 2025
[42]

C. A. Naesseth, F. Lindsten, T. B. Schön, et al. Elements of sequential Monte Carlo.Foundations and Trends® in Machine Learning, 12(3):307–392, 2019

2019
[43]

Douc and O

R. Douc and O. Cappé. Comparison of resampling schemes for particle filtering. InISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, pages 64–69, 2005

2005
[44]

S. N. Ethier and T. G. Kurtz.Markov Processes: Characterization and Convergence. John Wiley & Sons, 2009

2009
[45]

Del Moral.Mean Field Simulation for Monte Carlo Integration

P. Del Moral.Mean Field Simulation for Monte Carlo Integration. Chapman and Hall, CRC press, 2013

2013
[46]

Rousset and G

M. Rousset and G. Stoltz. Equilibrium sampling from nonequilibrium dynamics.Journal of Statistical Physics, 123:1251–1272, 2006

2006
[47]

Angeli.Interacting particle approximations of Feynman-Kac measures for continuous-time jump processes

L. Angeli.Interacting particle approximations of Feynman-Kac measures for continuous-time jump processes. PhD thesis, University of Warwick, 2020

2020
[48]

Angeli, S

L. Angeli, S. Grosskinsky, A. M. Johansen, and A. Pizzoferrato. Rare event simulation for stochastic dynamics in continuous time.Journal of Statistical Physics, 176(5):1185–1210, 2019

2019
[49]

A. V . Fiacco and G. P. McCormick.Nonlinear Programming. Society for Industrial and Applied Mathematics, 1990. doi:10.1137/1.9781611971316. URL https://epubs.siam.org/doi/ abs/10.1137/1.9781611971316

work page doi:10.1137/1.9781611971316 1990
[50]

Bertsekas.Nonlinear Programming

D. Bertsekas.Nonlinear Programming. 01 2003

2003
[51]

I. H. Dinwoodie. Large deviations techniques and applications (amir dembo and ofer zeitouni). SIAM Review, 36(2):303–304, 1994. doi:10.1137/1036078. URL https://doi.org/10. 1137/1036078

work page doi:10.1137/1036078 1994
[52]

C.-R. Hwang. Laplace’s Method Revisited: Weak Convergence of Probability Measures. The Annals of Probability, 8(6):1177 – 1182, 1980. doi:10.1214/aop/1176994579. URL https://doi.org/10.1214/aop/1176994579. 12 Appendix Contents A Related Work 13 B Additional Background 14 C From Deterministic Flow to Equivalent SDE 16 D Resampling Methods 17 E The BayesFP ...

work page doi:10.1214/aop/1176994579 1980
[53]

good” set Gδ := n x∈X:|h 1(x)| ≤δ, h 2(x)≤δ,L(x)≤ L ⋆ +δ o ,(59) and its complement (the “bad

for the FKC framework. Sequential Monte Carlo.Since our weights provide a proper weighting scheme for all intermediate distributions [40], we can leverage SMC techniques which reweight trajectories along their simulation. In practice, we find that resampling only over an ‘active interval’ t∈[t min, tmax] is useful for 17 improving sample quality and prese...

[1] [1]

Janner, Y

M. Janner, Y . Du, J. Tenenbaum, and S. Levine. Planning with diffusion for flexible behavior synthesis. InInternational Conference on Machine Learning (ICML), 2022

2022

[2] [2]

C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song. Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 2024

2024

[3] [3]

Lipman, R

Y . Lipman, R. T. Q. Chen, H. Ben-Hamu, M. Nickel, and M. Le. Flow matching for generative modeling. InInternational Conference on Learning Representations (ICLR), 2023

2023

[4] [4]

Ye and M

S. Ye and M. C. Gombolay. Efficient trajectory forecasting and generation with conditional flow matching. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024

2024

[5] [5]

Jiang, X

S. Jiang, X. Fang, N. Roy, T. Lozano-Pérez, L. P. Kaelbling, and S. Ancha. Streaming flow policy: Simplifying diffusion / flow-matching policies by treating action trajectories as flow trajectories. InConference on Robot Learning (CoRL), 2025

2025

[6] [6]

Ratliff, M

N. Ratliff, M. Zucker, J. A. Bagnell, and S. Srinivasa. Chomp: Gradient optimization techniques for efficient motion planning. InIEEE International Conference on Robotics and Automation (ICRA). IEEE, 2009

2009

[7] [7]

Schulman, J

J. Schulman, J. Ho, A. X. Lee, I. Awwal, H. Bradlow, and P. Abbeel. Finding locally optimal, collision-free trajectories with sequential convex optimization. InRobotics: science and systems, 2013

2013

[8] [8]

A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada. Control barrier functions: Theory and applications. In18th European Control Conference (ECC), 2019

2019

[9] [9]

Dhariwal and A

P. Dhariwal and A. Nichol. Diffusion models beat gans on image synthesis.Advances in Neural Information Processing Systems (NeurIPS), 2021

2021

[10] [10]

Xiao, T.-H

W. Xiao, T.-H. Wang, C. Gan, R. Hasani, M. Lechner, and D. Rus. Safediffuser: Safe planning with diffusion probabilistic models. InThe Thirteenth International Conference on Learning Representations (ICLR), 2023. 9

2023

[11] [11]

Römer, A

R. Römer, A. v. Rohr, and A. Schoellig. Diffusion predictive control with constraints. In Proceedings of the 7th Annual Learning for Dynamics & Control Conference, Proceedings of Machine Learning Research, 2025

2025

[12] [12]

Lou and S

A. Lou and S. Ermon. Reflected diffusion models. InInternational Conference on Machine Learning (ICML). PMLR, 2023

2023

[13] [13]

G.-H. Liu, T. Chen, E. Theodorou, and M. Tao. Mirror diffusion models for constrained and watermarked generation.Advances in Neural Information Processing Systems (NeurIPS), 2023

2023

[14] [14]

W. Jung, U. A. Mishra, N. R. Arachchige, Y . Chen, D. Xu, and S. Kousik. Joint model-based model-free diffusion for planning with constraints. In9th Annual Conference on Robot Learning,

[15] [15]

URLhttps://openreview.net/forum?id=E9t1ekt6W9

[16] [16]

Skreta, T

M. Skreta, T. Akhound-Sadegh, V . Ohanesian, R. Bondesan, A. Aspuru-Guzik, A. Doucet, R. Brekelmans, A. Tong, and K. Neklyudov. Feynman-kac correctors in diffusion: Annealing, guidance, and product of experts. InForty-second International Conference on Machine Learning, 2025. URLhttps://openreview.net/forum?id=Vhc0KrcqWu

2025

[17] [17]

Singh and I

S. Singh and I. Fischer. Stochastic sampling from deterministic flow models, 2024. URL https://arxiv.org/abs/2410.02217

arXiv 2024

[18] [18]

Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations, 2021

2021

[19] [19]

Braun, N

M. Braun, N. Jaquier, L. Rozo, and T. Asfour. Riemannian flow matching policy for robot motion learning, 2024. URLhttps://arxiv.org/abs/2403.10672

arXiv 2024

[20] [20]

X. Liu, C. Gong, and Q. Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022

Pith/arXiv arXiv 2022

[21] [21]

Lambert, A

A. Lambert, A. Fishman, D. Fox, B. Boots, and F. Ramos. Stein variational model predictive control, 2021. URLhttps://arxiv.org/abs/2011.07641

arXiv 2021

[22] [22]

S. Levine. Reinforcement learning and control as probabilistic inference: Tutorial and review. CoRR, abs/1805.00909, 2018. URLhttp://arxiv.org/abs/1805.00909

Pith/arXiv arXiv 2018

[23] [23]

Rawlik, M

K. Rawlik, M. Toussaint, and S. Vijayakumar. On stochastic optimal control and reinforcement learning by approximate inference. InTwenty-Third International Joint Conference on Artificial Intelligence, 2013

2013

[24] [24]

Bjorck, F

NVIDIA, :, J. Bjorck, F. Castañeda, N. Cherniadev, X. Da, R. Ding, L. J. Fan, Y . Fang, D. Fox, F. Hu, S. Huang, J. Jang, Z. Jiang, J. Kautz, K. Kundalia, L. Lao, Z. Li, Z. Lin, K. Lin, G. Liu, E. Llontop, L. Magne, A. Mandlekar, A. Narayan, S. Nasiriany, S. Reed, Y . L. Tan, G. Wang, Z. Wang, J. Wang, Q. Wang, J. Xiang, Y . Xie, Y . Xu, Z. Xu, S. Ye, Z. ...

Pith/arXiv arXiv 2025

[25] [25]

Intelligence, K

P. Intelligence, K. Black, N. Brown, J. Darpinian, K. Dhabalia, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, M. Y . Galliker, D. Ghosh, L. Groom, K. Hausman, B. Ichter, S. Jakubczak, T. Jones, L. Ke, D. LeBlanc, S. Levine, A. Li-Bell, M. Mothukuri, S. Nair, K. Pertsch, A. Z. Ren, L. X. Shi, L. Smith, J. T. Springenberg, K. Stachowicz, J. Tanner, Q. V...

Pith/arXiv arXiv 2025

[26] [26]

Y . Song, L. Le, Y .-H. Park, J. Wang, J. Shi, L. Liu, J. Gu, E. Eaton, D. Jayaraman, and K. Daniilidis. Omniguide: Universal guidance fields for enhancing generalist robot policies,

[27] [27]

URLhttps://arxiv.org/abs/2603.10052. 10

arXiv

[28] [28]

Y . Wang, L. Wang, Y . Du, B. Sundaralingam, X. Yang, Y .-W. Chao, C. Perez-D’Arpino, D. Fox, and J. Shah. Inference-time policy steering through human interactions, 2025. URL https://arxiv.org/abs/2411.16627

arXiv 2025

[29] [29]

J. Long, D. Liu, W. Cai, I. Manchester, and W. Zhi. Constraining streaming flow models for adapting learned robot trajectory distributions, 2026. URL https://arxiv.org/abs/2602. 15567

2026

[30] [30]

Millane, H

A. Millane, H. Oleynikova, E. Wirbel, R. Steiner, V . Ramasamy, D. Tingdahl, and R. Siegwart. nvblox: Gpu-accelerated incremental signed distance field mapping, 2024. URL https: //arxiv.org/abs/2311.00626

arXiv 2024

[31] [31]

Mandlekar, D

A. Mandlekar, D. Xu, J. Wong, S. Nasiriany, C. Wang, R. Kulkarni, L. Fei-Fei, S. Savarese, Y . Zhu, and R. Martín-Martín. What matters in learning from offline human demonstrations for robot manipulation. In5th Annual Conference on Robot Learning, 2021. URL https: //openreview.net/forum?id=JrsfBJtDFdI

2021

[32] [32]

B. Liu, Y . Zhu, C. Gao, Y . Feng, Q. Liu, Y . Zhu, and P. Stone. Libero: Benchmarking knowledge transfer for lifelong robot learning, 2023. URLhttps://arxiv.org/abs/2306.03310

Pith/arXiv arXiv 2023

[33] [33]

Khazatsky, K

A. Khazatsky, K. Pertsch, S. Nair, A. Balakrishna, S. Dasari, S. Karamcheti, S. Nasiriany, M. K. Srirama, L. Y . Chen, K. Ellis, P. D. Fagan, J. Hejna, M. Itkina, M. Lepert, Y . J. Ma, P. T. Miller, J. Wu, S. Belkhale, S. Dass, H. Ha, A. Jain, A. Lee, Y . Lee, M. Memmel, S. Park, I. Radosavovic, K. Wang, A. Zhan, K. Black, C. Chi, K. B. Hatch, S. Lin, J. ...

Pith/arXiv arXiv 2025

[34] [34]

X. Yang, R. Dagli, A. Zook, H. Hadfield, A. Goyal, S. Birchfield, F. Ramos, and J. Tremblay. Robolab: A high-fidelity simulation benchmark for analysis of task generalist policies, 2026. URLhttps://arxiv.org/abs/2604.09860

Pith/arXiv arXiv 2026

[35] [35]

TheRobotStudio and H. Face. Standard Open SO-100 & SO-101 Arms. https://github. com/TheRobotStudio/SO-ARM100

[36] [36]

S. H. Høeg, Y . Du, and O. Egeland. Fast policy synthesis with variable noise diffusion models. InIEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025

2025

[37] [37]

Mishra and I

R. Mishra and I. R. Manchester. Eb-mbd: Emerging-barrier model-based diffusion for safe trajectory optimization in highly constrained environments. InIEEE International Conference on Robotics and Automation (ICRA). IEEE, 2026

2026

[38] [38]

Uehara, Y

M. Uehara, Y . Zhao, T. Biancalani, and S. Levine. Understanding reinforcement learning-based fine-tuning of diffusion models: A tutorial and review.arXiv preprint arXiv:2407.13734, 2024

arXiv 2024

[39] [39]

L. Wu, B. Trippe, C. Naesseth, D. Blei, and J. P. Cunningham. Practical and asymptotically exact conditional sampling in diffusion models. InAdvances in Neural Information Processing Systems, 2024

2024

[40] [40]

G. V . Cardoso, Y . J. El Idrissi, S. Le Corff, and E. Moulines. Monte Carlo guided diffusion for Bayesian linear inverse problems. InInternational Conference on Learning Representations, 2024. 11

2024

[41] [41]

Singhal, Z

R. Singhal, Z. Horvitz, R. Teehan, M. Ren, Z. Yu, K. McKeown, and R. Ranganath. A general framework for inference-time scaling and steering of diffusion models, 2025. URL https://arxiv.org/abs/2501.06848

arXiv 2025

[42] [42]

C. A. Naesseth, F. Lindsten, T. B. Schön, et al. Elements of sequential Monte Carlo.Foundations and Trends® in Machine Learning, 12(3):307–392, 2019

2019

[43] [43]

Douc and O

R. Douc and O. Cappé. Comparison of resampling schemes for particle filtering. InISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, pages 64–69, 2005

2005

[44] [44]

S. N. Ethier and T. G. Kurtz.Markov Processes: Characterization and Convergence. John Wiley & Sons, 2009

2009

[45] [45]

Del Moral.Mean Field Simulation for Monte Carlo Integration

P. Del Moral.Mean Field Simulation for Monte Carlo Integration. Chapman and Hall, CRC press, 2013

2013

[46] [46]

Rousset and G

M. Rousset and G. Stoltz. Equilibrium sampling from nonequilibrium dynamics.Journal of Statistical Physics, 123:1251–1272, 2006

2006

[47] [47]

Angeli.Interacting particle approximations of Feynman-Kac measures for continuous-time jump processes

L. Angeli.Interacting particle approximations of Feynman-Kac measures for continuous-time jump processes. PhD thesis, University of Warwick, 2020

2020

[48] [48]

Angeli, S

L. Angeli, S. Grosskinsky, A. M. Johansen, and A. Pizzoferrato. Rare event simulation for stochastic dynamics in continuous time.Journal of Statistical Physics, 176(5):1185–1210, 2019

2019

[49] [49]

A. V . Fiacco and G. P. McCormick.Nonlinear Programming. Society for Industrial and Applied Mathematics, 1990. doi:10.1137/1.9781611971316. URL https://epubs.siam.org/doi/ abs/10.1137/1.9781611971316

work page doi:10.1137/1.9781611971316 1990

[50] [50]

Bertsekas.Nonlinear Programming

D. Bertsekas.Nonlinear Programming. 01 2003

2003

[51] [51]

I. H. Dinwoodie. Large deviations techniques and applications (amir dembo and ofer zeitouni). SIAM Review, 36(2):303–304, 1994. doi:10.1137/1036078. URL https://doi.org/10. 1137/1036078

work page doi:10.1137/1036078 1994

[52] [52]

C.-R. Hwang. Laplace’s Method Revisited: Weak Convergence of Probability Measures. The Annals of Probability, 8(6):1177 – 1182, 1980. doi:10.1214/aop/1176994579. URL https://doi.org/10.1214/aop/1176994579. 12 Appendix Contents A Related Work 13 B Additional Background 14 C From Deterministic Flow to Equivalent SDE 16 D Resampling Methods 17 E The BayesFP ...

work page doi:10.1214/aop/1176994579 1980

[53] [53]

good” set Gδ := n x∈X:|h 1(x)| ≤δ, h 2(x)≤δ,L(x)≤ L ⋆ +δ o ,(59) and its complement (the “bad

for the FKC framework. Sequential Monte Carlo.Since our weights provide a proper weighting scheme for all intermediate distributions [40], we can leverage SMC techniques which reweight trajectories along their simulation. In practice, we find that resampling only over an ‘active interval’ t∈[t min, tmax] is useful for 17 improving sample quality and prese...