pith. sign in

arxiv: 2606.21014 · v1 · pith:UE4JHSZFnew · submitted 2026-06-19 · 💻 cs.RO

BayesFP: Posterior Estimation for Flow-Based Policies via Feynman-Kac Sampling

Pith reviewed 2026-06-26 14:38 UTC · model grok-4.3

classification 💻 cs.RO
keywords Bayesian posterior samplingFeynman-Kac samplingflow-matching policiesdiffusion policiestrajectory generationconstrained generationrobot manipulationinference-time adaptation
0
0 comments X

The pith

Constrained trajectory generation for pretrained diffusion and flow-matching policies reduces to Bayesian posterior sampling solved at inference time via an extended Feynman-Kac framework.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper treats the problem of making pretrained robot policies obey new safety constraints and task goals as a Bayesian update: the policy's learned distribution over demonstrations serves as the prior, while an inference-time cost function supplies the likelihood that tilts samples toward feasible trajectories. Sampling from the resulting posterior is performed by adapting the Feynman-Kac corrector, originally developed for diffusion models, so that it also works for deterministic flow-matching policies. Because the method requires no retraining or architectural changes to the base policy, it supplies a single inference-time procedure usable across both families of generative policies. A reader would care because it turns existing checkpoints into flexible planners that can handle obstacles or objectives introduced only after training.

Core claim

Constrained trajectory generation for pretrained diffusion and flow-matching policies can be formulated as Bayesian posterior sampling in which the learned demonstration distribution is the prior and an inference-time, cost-derived likelihood tilts the distribution toward feasible and optimal trajectories; the Feynman-Kac corrector framework, extended to deterministic flow-matching policies, then yields a unified, retraining-free sampler that draws from this posterior.

What carries the argument

The Feynman-Kac corrector framework extended from diffusion models to deterministic flow-matching policies, which performs the posterior sampling step.

If this is right

  • The same sampler works without modification for both diffusion policies and flow-matching policies.
  • Constraints such as non-convex obstacle avoidance can be introduced only at inference time and still produce valid trajectories.
  • Performance improves over the base policy on zero-shot manipulation tasks in both simulation and real-world settings.
  • No retraining or fine-tuning of the original checkpoints is required.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be combined with online replanning loops to handle time-varying constraints.
  • If the cost function is differentiable, gradients of the likelihood could be used to further accelerate sampling.
  • The formulation suggests that similar posterior-tilting techniques might apply to other autoregressive or energy-based policies.

Load-bearing premise

The Feynman-Kac corrector framework can be extended to deterministic flow-matching policies in a way that correctly samples from the defined posterior without any retraining of the base policy.

What would settle it

Running the sampler on a low-dimensional toy problem where the exact posterior can be computed by rejection sampling or MCMC and observing that the generated trajectories do not match the exact posterior distribution in support or moments would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.21014 by Fabio Ramos, Sreevardhan Sirigiri, Weiming Zhi.

Figure 1
Figure 1. Figure 1: Toy 2D environments with obstacle constraints. Top: the diffusion model is trained on demonstrations that avoid the blue circular regions, while the red obstacle constraints are introduced only at inference time. We compare no guidance, a naive linear-combination base￾line that replaces the learned score ∇ log pt(x) by ∇ log pt(x) + λt∇J (x), and our BayesFP (Bayes Flow-based Policies). Bottom: the model i… view at source ↗
Figure 2
Figure 2. Figure 2: Scene visualization of the Robomimic and LIBERO tasks. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Scene visualization of the Robolab tasks. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Scene visualization of the real-world tasks [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Ablations on LIBERO-Object with a cylindrical obstacle (radius [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

Robots must generate trajectories that remain faithful to learned expert behavior while satisfying safety constraints and task-specific objectives specified only at inference time. We formulate constrained trajectory generation for pretrained diffusion and flow-matching policies as Bayesian posterior sampling, with the learned demonstration distribution as a prior and an inference-time, cost-derived likelihood tilting it toward feasible, optimal trajectories. To sample from this posterior without any retraining of the base policy, we leverage the Feynman--Kac corrector framework, originally formulated for diffusion models, and extend it to deterministic flow-matching policies. The result is a unified, inference-time, retraining-free sampler for diffusion and flow policies. We validate the approach on pretrained Diffusion Policy, GR00T-N1.6, and $\pi_{0.5}$ checkpoints across simulated and real-world manipulation tasks, including planning around non-convex obstacles introduced at inference time, and show improvements over the base $\pi_{0.5}$ on zero-shot tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to formulate constrained trajectory generation for pretrained diffusion and flow-matching policies as Bayesian posterior sampling, with the learned demonstration distribution as prior and an inference-time cost-derived likelihood. It extends the Feynman-Kac corrector framework (originally for diffusion SDEs) to deterministic flow-matching ODEs to enable exact sampling from the posterior without retraining the base policy. The result is presented as a unified inference-time sampler, validated on Diffusion Policy, GR00T-N1.6, and π_{0.5} checkpoints across simulated and real-world manipulation tasks with non-convex obstacles introduced at inference time.

Significance. If the extension to flow-matching policies is shown to yield exact posterior samples, the work offers a retraining-free approach to incorporating dynamic constraints into flow-based robot policies. This could have practical significance for robotics, where safety and task objectives often arise only at inference time, and would unify handling of both stochastic diffusion and deterministic flow policies under one framework.

major comments (2)
  1. [§3 (Feynman-Kac extension to flow-matching)] The extension of the Feynman-Kac corrector to deterministic flow-matching policies is load-bearing for the central claim of exact posterior sampling. The abstract asserts that the framework extends such that the likelihood tilts the velocity field to sample from p(trajectory | constraints), but the original construction relies on stochasticity for the corrector; the methods derivation must explicitly show how the deterministic ODE preserves the posterior measure (e.g., via the appropriate likelihood gradient incorporation) or the generated trajectories are not guaranteed to be exact samples.
  2. [§5 (Experiments)] Table 2 (or equivalent results table): the reported improvements over base π_{0.5} on zero-shot tasks support practicality but do not directly test whether samples are drawn from the defined posterior versus an approximation; without metrics such as constraint violation rates under the exact posterior or comparison to rejection sampling baselines, the claim of exact sampling remains unverified.
minor comments (2)
  1. [§2] Notation for the likelihood function and cost-derived tilting term should be introduced with an explicit equation early in the methods to avoid ambiguity when reading the extension claim.
  2. [Figure 3] Figure 3 caption: clarify whether the visualized trajectories are single samples or aggregated statistics, as this affects interpretation of constraint satisfaction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [§3 (Feynman-Kac extension to flow-matching)] The extension of the Feynman-Kac corrector to deterministic flow-matching policies is load-bearing for the central claim of exact posterior sampling. The abstract asserts that the framework extends such that the likelihood tilts the velocity field to sample from p(trajectory | constraints), but the original construction relies on stochasticity for the corrector; the methods derivation must explicitly show how the deterministic ODE preserves the posterior measure (e.g., via the appropriate likelihood gradient incorporation) or the generated trajectories are not guaranteed to be exact samples.

    Authors: We appreciate the referee's emphasis on the need for an explicit demonstration of measure preservation in the deterministic case. Section 3 derives the extension by showing that the cost-derived likelihood is incorporated as a multiplicative tilt on the flow velocity field; because the flow ODE defines a deterministic transport map, this tilt yields the desired posterior measure without requiring additional stochasticity. To strengthen clarity, we will expand the derivation with an explicit proof sketch of measure invariance under the modified velocity field. revision: yes

  2. Referee: [§5 (Experiments)] Table 2 (or equivalent results table): the reported improvements over base π_{0.5} on zero-shot tasks support practicality but do not directly test whether samples are drawn from the defined posterior versus an approximation; without metrics such as constraint violation rates under the exact posterior or comparison to rejection sampling baselines, the claim of exact sampling remains unverified.

    Authors: We agree that the current experiments emphasize task-level performance rather than direct verification of exact posterior sampling. We will add, in the revised manuscript, a controlled comparison against rejection sampling on a low-dimensional synthetic task (where exact sampling is tractable) together with quantitative constraint-violation statistics to provide empirical support for the exactness claim. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation builds on external Feynman-Kac framework

full rationale

The paper formulates posterior sampling by extending the Feynman-Kac corrector (originally for diffusion SDEs) to flow-matching ODEs, then validates on pretrained checkpoints without retraining. No equations or steps in the provided text reduce a claimed result to a fitted parameter, self-definition, or self-citation chain; the central extension is presented as a technical contribution leveraging an independent prior framework rather than re-deriving its own inputs. The method is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; all such elements would require the full manuscript.

pith-pipeline@v0.9.1-grok · 5698 in / 1007 out tokens · 17839 ms · 2026-06-26T14:38:10.800521+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 3 canonical work pages

  1. [1]

    Janner, Y

    M. Janner, Y . Du, J. Tenenbaum, and S. Levine. Planning with diffusion for flexible behavior synthesis. InInternational Conference on Machine Learning (ICML), 2022

  2. [2]

    C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song. Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 2024

  3. [3]

    Lipman, R

    Y . Lipman, R. T. Q. Chen, H. Ben-Hamu, M. Nickel, and M. Le. Flow matching for generative modeling. InInternational Conference on Learning Representations (ICLR), 2023

  4. [4]

    Ye and M

    S. Ye and M. C. Gombolay. Efficient trajectory forecasting and generation with conditional flow matching. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024

  5. [5]

    Jiang, X

    S. Jiang, X. Fang, N. Roy, T. Lozano-Pérez, L. P. Kaelbling, and S. Ancha. Streaming flow policy: Simplifying diffusion / flow-matching policies by treating action trajectories as flow trajectories. InConference on Robot Learning (CoRL), 2025

  6. [6]

    Ratliff, M

    N. Ratliff, M. Zucker, J. A. Bagnell, and S. Srinivasa. Chomp: Gradient optimization techniques for efficient motion planning. InIEEE International Conference on Robotics and Automation (ICRA). IEEE, 2009

  7. [7]

    Schulman, J

    J. Schulman, J. Ho, A. X. Lee, I. Awwal, H. Bradlow, and P. Abbeel. Finding locally optimal, collision-free trajectories with sequential convex optimization. InRobotics: science and systems, 2013

  8. [8]

    A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada. Control barrier functions: Theory and applications. In18th European Control Conference (ECC), 2019

  9. [9]

    Dhariwal and A

    P. Dhariwal and A. Nichol. Diffusion models beat gans on image synthesis.Advances in Neural Information Processing Systems (NeurIPS), 2021

  10. [10]

    Xiao, T.-H

    W. Xiao, T.-H. Wang, C. Gan, R. Hasani, M. Lechner, and D. Rus. Safediffuser: Safe planning with diffusion probabilistic models. InThe Thirteenth International Conference on Learning Representations (ICLR), 2023. 9

  11. [11]

    Römer, A

    R. Römer, A. v. Rohr, and A. Schoellig. Diffusion predictive control with constraints. In Proceedings of the 7th Annual Learning for Dynamics & Control Conference, Proceedings of Machine Learning Research, 2025

  12. [12]

    Lou and S

    A. Lou and S. Ermon. Reflected diffusion models. InInternational Conference on Machine Learning (ICML). PMLR, 2023

  13. [13]

    G.-H. Liu, T. Chen, E. Theodorou, and M. Tao. Mirror diffusion models for constrained and watermarked generation.Advances in Neural Information Processing Systems (NeurIPS), 2023

  14. [14]

    W. Jung, U. A. Mishra, N. R. Arachchige, Y . Chen, D. Xu, and S. Kousik. Joint model-based model-free diffusion for planning with constraints. In9th Annual Conference on Robot Learning,

  15. [15]

    URLhttps://openreview.net/forum?id=E9t1ekt6W9

  16. [16]

    Skreta, T

    M. Skreta, T. Akhound-Sadegh, V . Ohanesian, R. Bondesan, A. Aspuru-Guzik, A. Doucet, R. Brekelmans, A. Tong, and K. Neklyudov. Feynman-kac correctors in diffusion: Annealing, guidance, and product of experts. InForty-second International Conference on Machine Learning, 2025. URLhttps://openreview.net/forum?id=Vhc0KrcqWu

  17. [17]

    Singh and I

    S. Singh and I. Fischer. Stochastic sampling from deterministic flow models, 2024. URL https://arxiv.org/abs/2410.02217

  18. [18]

    Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations, 2021

  19. [19]

    Braun, N

    M. Braun, N. Jaquier, L. Rozo, and T. Asfour. Riemannian flow matching policy for robot motion learning, 2024. URLhttps://arxiv.org/abs/2403.10672

  20. [20]

    X. Liu, C. Gong, and Q. Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022

  21. [21]

    Lambert, A

    A. Lambert, A. Fishman, D. Fox, B. Boots, and F. Ramos. Stein variational model predictive control, 2021. URLhttps://arxiv.org/abs/2011.07641

  22. [22]

    S. Levine. Reinforcement learning and control as probabilistic inference: Tutorial and review. CoRR, abs/1805.00909, 2018. URLhttp://arxiv.org/abs/1805.00909

  23. [23]

    Rawlik, M

    K. Rawlik, M. Toussaint, and S. Vijayakumar. On stochastic optimal control and reinforcement learning by approximate inference. InTwenty-Third International Joint Conference on Artificial Intelligence, 2013

  24. [24]

    Bjorck, F

    NVIDIA, :, J. Bjorck, F. Castañeda, N. Cherniadev, X. Da, R. Ding, L. J. Fan, Y . Fang, D. Fox, F. Hu, S. Huang, J. Jang, Z. Jiang, J. Kautz, K. Kundalia, L. Lao, Z. Li, Z. Lin, K. Lin, G. Liu, E. Llontop, L. Magne, A. Mandlekar, A. Narayan, S. Nasiriany, S. Reed, Y . L. Tan, G. Wang, Z. Wang, J. Wang, Q. Wang, J. Xiang, Y . Xie, Y . Xu, Z. Xu, S. Ye, Z. ...

  25. [25]

    Intelligence, K

    P. Intelligence, K. Black, N. Brown, J. Darpinian, K. Dhabalia, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, M. Y . Galliker, D. Ghosh, L. Groom, K. Hausman, B. Ichter, S. Jakubczak, T. Jones, L. Ke, D. LeBlanc, S. Levine, A. Li-Bell, M. Mothukuri, S. Nair, K. Pertsch, A. Z. Ren, L. X. Shi, L. Smith, J. T. Springenberg, K. Stachowicz, J. Tanner, Q. V...

  26. [26]

    Y . Song, L. Le, Y .-H. Park, J. Wang, J. Shi, L. Liu, J. Gu, E. Eaton, D. Jayaraman, and K. Daniilidis. Omniguide: Universal guidance fields for enhancing generalist robot policies,

  27. [27]

    URLhttps://arxiv.org/abs/2603.10052. 10

  28. [28]

    Y . Wang, L. Wang, Y . Du, B. Sundaralingam, X. Yang, Y .-W. Chao, C. Perez-D’Arpino, D. Fox, and J. Shah. Inference-time policy steering through human interactions, 2025. URL https://arxiv.org/abs/2411.16627

  29. [29]

    J. Long, D. Liu, W. Cai, I. Manchester, and W. Zhi. Constraining streaming flow models for adapting learned robot trajectory distributions, 2026. URL https://arxiv.org/abs/2602. 15567

  30. [30]

    Millane, H

    A. Millane, H. Oleynikova, E. Wirbel, R. Steiner, V . Ramasamy, D. Tingdahl, and R. Siegwart. nvblox: Gpu-accelerated incremental signed distance field mapping, 2024. URL https: //arxiv.org/abs/2311.00626

  31. [31]

    Mandlekar, D

    A. Mandlekar, D. Xu, J. Wong, S. Nasiriany, C. Wang, R. Kulkarni, L. Fei-Fei, S. Savarese, Y . Zhu, and R. Martín-Martín. What matters in learning from offline human demonstrations for robot manipulation. In5th Annual Conference on Robot Learning, 2021. URL https: //openreview.net/forum?id=JrsfBJtDFdI

  32. [32]

    B. Liu, Y . Zhu, C. Gao, Y . Feng, Q. Liu, Y . Zhu, and P. Stone. Libero: Benchmarking knowledge transfer for lifelong robot learning, 2023. URLhttps://arxiv.org/abs/2306.03310

  33. [33]

    Khazatsky, K

    A. Khazatsky, K. Pertsch, S. Nair, A. Balakrishna, S. Dasari, S. Karamcheti, S. Nasiriany, M. K. Srirama, L. Y . Chen, K. Ellis, P. D. Fagan, J. Hejna, M. Itkina, M. Lepert, Y . J. Ma, P. T. Miller, J. Wu, S. Belkhale, S. Dass, H. Ha, A. Jain, A. Lee, Y . Lee, M. Memmel, S. Park, I. Radosavovic, K. Wang, A. Zhan, K. Black, C. Chi, K. B. Hatch, S. Lin, J. ...

  34. [34]

    X. Yang, R. Dagli, A. Zook, H. Hadfield, A. Goyal, S. Birchfield, F. Ramos, and J. Tremblay. Robolab: A high-fidelity simulation benchmark for analysis of task generalist policies, 2026. URLhttps://arxiv.org/abs/2604.09860

  35. [35]

    TheRobotStudio and H. Face. Standard Open SO-100 & SO-101 Arms. https://github. com/TheRobotStudio/SO-ARM100

  36. [36]

    S. H. Høeg, Y . Du, and O. Egeland. Fast policy synthesis with variable noise diffusion models. InIEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025

  37. [37]

    Mishra and I

    R. Mishra and I. R. Manchester. Eb-mbd: Emerging-barrier model-based diffusion for safe trajectory optimization in highly constrained environments. InIEEE International Conference on Robotics and Automation (ICRA). IEEE, 2026

  38. [38]

    Uehara, Y

    M. Uehara, Y . Zhao, T. Biancalani, and S. Levine. Understanding reinforcement learning-based fine-tuning of diffusion models: A tutorial and review.arXiv preprint arXiv:2407.13734, 2024

  39. [39]

    L. Wu, B. Trippe, C. Naesseth, D. Blei, and J. P. Cunningham. Practical and asymptotically exact conditional sampling in diffusion models. InAdvances in Neural Information Processing Systems, 2024

  40. [40]

    G. V . Cardoso, Y . J. El Idrissi, S. Le Corff, and E. Moulines. Monte Carlo guided diffusion for Bayesian linear inverse problems. InInternational Conference on Learning Representations, 2024. 11

  41. [41]

    Singhal, Z

    R. Singhal, Z. Horvitz, R. Teehan, M. Ren, Z. Yu, K. McKeown, and R. Ranganath. A general framework for inference-time scaling and steering of diffusion models, 2025. URL https://arxiv.org/abs/2501.06848

  42. [42]

    C. A. Naesseth, F. Lindsten, T. B. Schön, et al. Elements of sequential Monte Carlo.Foundations and Trends® in Machine Learning, 12(3):307–392, 2019

  43. [43]

    Douc and O

    R. Douc and O. Cappé. Comparison of resampling schemes for particle filtering. InISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, pages 64–69, 2005

  44. [44]

    S. N. Ethier and T. G. Kurtz.Markov Processes: Characterization and Convergence. John Wiley & Sons, 2009

  45. [45]

    Del Moral.Mean Field Simulation for Monte Carlo Integration

    P. Del Moral.Mean Field Simulation for Monte Carlo Integration. Chapman and Hall, CRC press, 2013

  46. [46]

    Rousset and G

    M. Rousset and G. Stoltz. Equilibrium sampling from nonequilibrium dynamics.Journal of Statistical Physics, 123:1251–1272, 2006

  47. [47]

    Angeli.Interacting particle approximations of Feynman-Kac measures for continuous-time jump processes

    L. Angeli.Interacting particle approximations of Feynman-Kac measures for continuous-time jump processes. PhD thesis, University of Warwick, 2020

  48. [48]

    Angeli, S

    L. Angeli, S. Grosskinsky, A. M. Johansen, and A. Pizzoferrato. Rare event simulation for stochastic dynamics in continuous time.Journal of Statistical Physics, 176(5):1185–1210, 2019

  49. [49]

    A. V . Fiacco and G. P. McCormick.Nonlinear Programming. Society for Industrial and Applied Mathematics, 1990. doi:10.1137/1.9781611971316. URL https://epubs.siam.org/doi/ abs/10.1137/1.9781611971316

  50. [50]

    Bertsekas.Nonlinear Programming

    D. Bertsekas.Nonlinear Programming. 01 2003

  51. [51]

    I. H. Dinwoodie. Large deviations techniques and applications (amir dembo and ofer zeitouni). SIAM Review, 36(2):303–304, 1994. doi:10.1137/1036078. URL https://doi.org/10. 1137/1036078

  52. [52]

    C.-R. Hwang. Laplace’s Method Revisited: Weak Convergence of Probability Measures. The Annals of Probability, 8(6):1177 – 1182, 1980. doi:10.1214/aop/1176994579. URL https://doi.org/10.1214/aop/1176994579. 12 Appendix Contents A Related Work 13 B Additional Background 14 C From Deterministic Flow to Equivalent SDE 16 D Resampling Methods 17 E The BayesFP ...

  53. [53]

    good” set Gδ := n x∈X:|h 1(x)| ≤δ, h 2(x)≤δ,L(x)≤ L ⋆ +δ o ,(59) and its complement (the “bad

    for the FKC framework. Sequential Monte Carlo.Since our weights provide a proper weighting scheme for all intermediate distributions [40], we can leverage SMC techniques which reweight trajectories along their simulation. In practice, we find that resampling only over an ‘active interval’ t∈[t min, tmax] is useful for 17 improving sample quality and prese...