BayesFP: Posterior Estimation for Flow-Based Policies via Feynman-Kac Sampling
Pith reviewed 2026-06-26 14:38 UTC · model grok-4.3
The pith
Constrained trajectory generation for pretrained diffusion and flow-matching policies reduces to Bayesian posterior sampling solved at inference time via an extended Feynman-Kac framework.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Constrained trajectory generation for pretrained diffusion and flow-matching policies can be formulated as Bayesian posterior sampling in which the learned demonstration distribution is the prior and an inference-time, cost-derived likelihood tilts the distribution toward feasible and optimal trajectories; the Feynman-Kac corrector framework, extended to deterministic flow-matching policies, then yields a unified, retraining-free sampler that draws from this posterior.
What carries the argument
The Feynman-Kac corrector framework extended from diffusion models to deterministic flow-matching policies, which performs the posterior sampling step.
If this is right
- The same sampler works without modification for both diffusion policies and flow-matching policies.
- Constraints such as non-convex obstacle avoidance can be introduced only at inference time and still produce valid trajectories.
- Performance improves over the base policy on zero-shot manipulation tasks in both simulation and real-world settings.
- No retraining or fine-tuning of the original checkpoints is required.
Where Pith is reading between the lines
- The approach could be combined with online replanning loops to handle time-varying constraints.
- If the cost function is differentiable, gradients of the likelihood could be used to further accelerate sampling.
- The formulation suggests that similar posterior-tilting techniques might apply to other autoregressive or energy-based policies.
Load-bearing premise
The Feynman-Kac corrector framework can be extended to deterministic flow-matching policies in a way that correctly samples from the defined posterior without any retraining of the base policy.
What would settle it
Running the sampler on a low-dimensional toy problem where the exact posterior can be computed by rejection sampling or MCMC and observing that the generated trajectories do not match the exact posterior distribution in support or moments would falsify the central claim.
Figures
read the original abstract
Robots must generate trajectories that remain faithful to learned expert behavior while satisfying safety constraints and task-specific objectives specified only at inference time. We formulate constrained trajectory generation for pretrained diffusion and flow-matching policies as Bayesian posterior sampling, with the learned demonstration distribution as a prior and an inference-time, cost-derived likelihood tilting it toward feasible, optimal trajectories. To sample from this posterior without any retraining of the base policy, we leverage the Feynman--Kac corrector framework, originally formulated for diffusion models, and extend it to deterministic flow-matching policies. The result is a unified, inference-time, retraining-free sampler for diffusion and flow policies. We validate the approach on pretrained Diffusion Policy, GR00T-N1.6, and $\pi_{0.5}$ checkpoints across simulated and real-world manipulation tasks, including planning around non-convex obstacles introduced at inference time, and show improvements over the base $\pi_{0.5}$ on zero-shot tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to formulate constrained trajectory generation for pretrained diffusion and flow-matching policies as Bayesian posterior sampling, with the learned demonstration distribution as prior and an inference-time cost-derived likelihood. It extends the Feynman-Kac corrector framework (originally for diffusion SDEs) to deterministic flow-matching ODEs to enable exact sampling from the posterior without retraining the base policy. The result is presented as a unified inference-time sampler, validated on Diffusion Policy, GR00T-N1.6, and π_{0.5} checkpoints across simulated and real-world manipulation tasks with non-convex obstacles introduced at inference time.
Significance. If the extension to flow-matching policies is shown to yield exact posterior samples, the work offers a retraining-free approach to incorporating dynamic constraints into flow-based robot policies. This could have practical significance for robotics, where safety and task objectives often arise only at inference time, and would unify handling of both stochastic diffusion and deterministic flow policies under one framework.
major comments (2)
- [§3 (Feynman-Kac extension to flow-matching)] The extension of the Feynman-Kac corrector to deterministic flow-matching policies is load-bearing for the central claim of exact posterior sampling. The abstract asserts that the framework extends such that the likelihood tilts the velocity field to sample from p(trajectory | constraints), but the original construction relies on stochasticity for the corrector; the methods derivation must explicitly show how the deterministic ODE preserves the posterior measure (e.g., via the appropriate likelihood gradient incorporation) or the generated trajectories are not guaranteed to be exact samples.
- [§5 (Experiments)] Table 2 (or equivalent results table): the reported improvements over base π_{0.5} on zero-shot tasks support practicality but do not directly test whether samples are drawn from the defined posterior versus an approximation; without metrics such as constraint violation rates under the exact posterior or comparison to rejection sampling baselines, the claim of exact sampling remains unverified.
minor comments (2)
- [§2] Notation for the likelihood function and cost-derived tilting term should be introduced with an explicit equation early in the methods to avoid ambiguity when reading the extension claim.
- [Figure 3] Figure 3 caption: clarify whether the visualized trajectories are single samples or aggregated statistics, as this affects interpretation of constraint satisfaction.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major comment below and indicate the revisions we will make.
read point-by-point responses
-
Referee: [§3 (Feynman-Kac extension to flow-matching)] The extension of the Feynman-Kac corrector to deterministic flow-matching policies is load-bearing for the central claim of exact posterior sampling. The abstract asserts that the framework extends such that the likelihood tilts the velocity field to sample from p(trajectory | constraints), but the original construction relies on stochasticity for the corrector; the methods derivation must explicitly show how the deterministic ODE preserves the posterior measure (e.g., via the appropriate likelihood gradient incorporation) or the generated trajectories are not guaranteed to be exact samples.
Authors: We appreciate the referee's emphasis on the need for an explicit demonstration of measure preservation in the deterministic case. Section 3 derives the extension by showing that the cost-derived likelihood is incorporated as a multiplicative tilt on the flow velocity field; because the flow ODE defines a deterministic transport map, this tilt yields the desired posterior measure without requiring additional stochasticity. To strengthen clarity, we will expand the derivation with an explicit proof sketch of measure invariance under the modified velocity field. revision: yes
-
Referee: [§5 (Experiments)] Table 2 (or equivalent results table): the reported improvements over base π_{0.5} on zero-shot tasks support practicality but do not directly test whether samples are drawn from the defined posterior versus an approximation; without metrics such as constraint violation rates under the exact posterior or comparison to rejection sampling baselines, the claim of exact sampling remains unverified.
Authors: We agree that the current experiments emphasize task-level performance rather than direct verification of exact posterior sampling. We will add, in the revised manuscript, a controlled comparison against rejection sampling on a low-dimensional synthetic task (where exact sampling is tractable) together with quantitative constraint-violation statistics to provide empirical support for the exactness claim. revision: yes
Circularity Check
No circularity: derivation builds on external Feynman-Kac framework
full rationale
The paper formulates posterior sampling by extending the Feynman-Kac corrector (originally for diffusion SDEs) to flow-matching ODEs, then validates on pretrained checkpoints without retraining. No equations or steps in the provided text reduce a claimed result to a fitted parameter, self-definition, or self-citation chain; the central extension is presented as a technical contribution leveraging an independent prior framework rather than re-deriving its own inputs. The method is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Janner, Y
M. Janner, Y . Du, J. Tenenbaum, and S. Levine. Planning with diffusion for flexible behavior synthesis. InInternational Conference on Machine Learning (ICML), 2022
2022
-
[2]
C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song. Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 2024
2024
-
[3]
Lipman, R
Y . Lipman, R. T. Q. Chen, H. Ben-Hamu, M. Nickel, and M. Le. Flow matching for generative modeling. InInternational Conference on Learning Representations (ICLR), 2023
2023
-
[4]
Ye and M
S. Ye and M. C. Gombolay. Efficient trajectory forecasting and generation with conditional flow matching. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024
2024
-
[5]
Jiang, X
S. Jiang, X. Fang, N. Roy, T. Lozano-Pérez, L. P. Kaelbling, and S. Ancha. Streaming flow policy: Simplifying diffusion / flow-matching policies by treating action trajectories as flow trajectories. InConference on Robot Learning (CoRL), 2025
2025
-
[6]
Ratliff, M
N. Ratliff, M. Zucker, J. A. Bagnell, and S. Srinivasa. Chomp: Gradient optimization techniques for efficient motion planning. InIEEE International Conference on Robotics and Automation (ICRA). IEEE, 2009
2009
-
[7]
Schulman, J
J. Schulman, J. Ho, A. X. Lee, I. Awwal, H. Bradlow, and P. Abbeel. Finding locally optimal, collision-free trajectories with sequential convex optimization. InRobotics: science and systems, 2013
2013
-
[8]
A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada. Control barrier functions: Theory and applications. In18th European Control Conference (ECC), 2019
2019
-
[9]
Dhariwal and A
P. Dhariwal and A. Nichol. Diffusion models beat gans on image synthesis.Advances in Neural Information Processing Systems (NeurIPS), 2021
2021
-
[10]
Xiao, T.-H
W. Xiao, T.-H. Wang, C. Gan, R. Hasani, M. Lechner, and D. Rus. Safediffuser: Safe planning with diffusion probabilistic models. InThe Thirteenth International Conference on Learning Representations (ICLR), 2023. 9
2023
-
[11]
Römer, A
R. Römer, A. v. Rohr, and A. Schoellig. Diffusion predictive control with constraints. In Proceedings of the 7th Annual Learning for Dynamics & Control Conference, Proceedings of Machine Learning Research, 2025
2025
-
[12]
Lou and S
A. Lou and S. Ermon. Reflected diffusion models. InInternational Conference on Machine Learning (ICML). PMLR, 2023
2023
-
[13]
G.-H. Liu, T. Chen, E. Theodorou, and M. Tao. Mirror diffusion models for constrained and watermarked generation.Advances in Neural Information Processing Systems (NeurIPS), 2023
2023
-
[14]
W. Jung, U. A. Mishra, N. R. Arachchige, Y . Chen, D. Xu, and S. Kousik. Joint model-based model-free diffusion for planning with constraints. In9th Annual Conference on Robot Learning,
-
[15]
URLhttps://openreview.net/forum?id=E9t1ekt6W9
-
[16]
Skreta, T
M. Skreta, T. Akhound-Sadegh, V . Ohanesian, R. Bondesan, A. Aspuru-Guzik, A. Doucet, R. Brekelmans, A. Tong, and K. Neklyudov. Feynman-kac correctors in diffusion: Annealing, guidance, and product of experts. InForty-second International Conference on Machine Learning, 2025. URLhttps://openreview.net/forum?id=Vhc0KrcqWu
2025
-
[17]
S. Singh and I. Fischer. Stochastic sampling from deterministic flow models, 2024. URL https://arxiv.org/abs/2410.02217
arXiv 2024
-
[18]
Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations, 2021
2021
- [19]
-
[20]
X. Liu, C. Gong, and Q. Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022
Pith/arXiv arXiv 2022
-
[21]
A. Lambert, A. Fishman, D. Fox, B. Boots, and F. Ramos. Stein variational model predictive control, 2021. URLhttps://arxiv.org/abs/2011.07641
arXiv 2021
-
[22]
S. Levine. Reinforcement learning and control as probabilistic inference: Tutorial and review. CoRR, abs/1805.00909, 2018. URLhttp://arxiv.org/abs/1805.00909
Pith/arXiv arXiv 2018
-
[23]
Rawlik, M
K. Rawlik, M. Toussaint, and S. Vijayakumar. On stochastic optimal control and reinforcement learning by approximate inference. InTwenty-Third International Joint Conference on Artificial Intelligence, 2013
2013
-
[24]
NVIDIA, :, J. Bjorck, F. Castañeda, N. Cherniadev, X. Da, R. Ding, L. J. Fan, Y . Fang, D. Fox, F. Hu, S. Huang, J. Jang, Z. Jiang, J. Kautz, K. Kundalia, L. Lao, Z. Li, Z. Lin, K. Lin, G. Liu, E. Llontop, L. Magne, A. Mandlekar, A. Narayan, S. Nasiriany, S. Reed, Y . L. Tan, G. Wang, Z. Wang, J. Wang, Q. Wang, J. Xiang, Y . Xie, Y . Xu, Z. Xu, S. Ye, Z. ...
Pith/arXiv arXiv 2025
-
[25]
P. Intelligence, K. Black, N. Brown, J. Darpinian, K. Dhabalia, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, M. Y . Galliker, D. Ghosh, L. Groom, K. Hausman, B. Ichter, S. Jakubczak, T. Jones, L. Ke, D. LeBlanc, S. Levine, A. Li-Bell, M. Mothukuri, S. Nair, K. Pertsch, A. Z. Ren, L. X. Shi, L. Smith, J. T. Springenberg, K. Stachowicz, J. Tanner, Q. V...
Pith/arXiv arXiv 2025
-
[26]
Y . Song, L. Le, Y .-H. Park, J. Wang, J. Shi, L. Liu, J. Gu, E. Eaton, D. Jayaraman, and K. Daniilidis. Omniguide: Universal guidance fields for enhancing generalist robot policies,
-
[27]
URLhttps://arxiv.org/abs/2603.10052. 10
-
[28]
Y . Wang, L. Wang, Y . Du, B. Sundaralingam, X. Yang, Y .-W. Chao, C. Perez-D’Arpino, D. Fox, and J. Shah. Inference-time policy steering through human interactions, 2025. URL https://arxiv.org/abs/2411.16627
arXiv 2025
-
[29]
J. Long, D. Liu, W. Cai, I. Manchester, and W. Zhi. Constraining streaming flow models for adapting learned robot trajectory distributions, 2026. URL https://arxiv.org/abs/2602. 15567
2026
-
[30]
A. Millane, H. Oleynikova, E. Wirbel, R. Steiner, V . Ramasamy, D. Tingdahl, and R. Siegwart. nvblox: Gpu-accelerated incremental signed distance field mapping, 2024. URL https: //arxiv.org/abs/2311.00626
arXiv 2024
-
[31]
Mandlekar, D
A. Mandlekar, D. Xu, J. Wong, S. Nasiriany, C. Wang, R. Kulkarni, L. Fei-Fei, S. Savarese, Y . Zhu, and R. Martín-Martín. What matters in learning from offline human demonstrations for robot manipulation. In5th Annual Conference on Robot Learning, 2021. URL https: //openreview.net/forum?id=JrsfBJtDFdI
2021
-
[32]
B. Liu, Y . Zhu, C. Gao, Y . Feng, Q. Liu, Y . Zhu, and P. Stone. Libero: Benchmarking knowledge transfer for lifelong robot learning, 2023. URLhttps://arxiv.org/abs/2306.03310
Pith/arXiv arXiv 2023
-
[33]
A. Khazatsky, K. Pertsch, S. Nair, A. Balakrishna, S. Dasari, S. Karamcheti, S. Nasiriany, M. K. Srirama, L. Y . Chen, K. Ellis, P. D. Fagan, J. Hejna, M. Itkina, M. Lepert, Y . J. Ma, P. T. Miller, J. Wu, S. Belkhale, S. Dass, H. Ha, A. Jain, A. Lee, Y . Lee, M. Memmel, S. Park, I. Radosavovic, K. Wang, A. Zhan, K. Black, C. Chi, K. B. Hatch, S. Lin, J. ...
Pith/arXiv arXiv 2025
-
[34]
X. Yang, R. Dagli, A. Zook, H. Hadfield, A. Goyal, S. Birchfield, F. Ramos, and J. Tremblay. Robolab: A high-fidelity simulation benchmark for analysis of task generalist policies, 2026. URLhttps://arxiv.org/abs/2604.09860
Pith/arXiv arXiv 2026
-
[35]
TheRobotStudio and H. Face. Standard Open SO-100 & SO-101 Arms. https://github. com/TheRobotStudio/SO-ARM100
-
[36]
S. H. Høeg, Y . Du, and O. Egeland. Fast policy synthesis with variable noise diffusion models. InIEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025
2025
-
[37]
Mishra and I
R. Mishra and I. R. Manchester. Eb-mbd: Emerging-barrier model-based diffusion for safe trajectory optimization in highly constrained environments. InIEEE International Conference on Robotics and Automation (ICRA). IEEE, 2026
2026
- [38]
-
[39]
L. Wu, B. Trippe, C. Naesseth, D. Blei, and J. P. Cunningham. Practical and asymptotically exact conditional sampling in diffusion models. InAdvances in Neural Information Processing Systems, 2024
2024
-
[40]
G. V . Cardoso, Y . J. El Idrissi, S. Le Corff, and E. Moulines. Monte Carlo guided diffusion for Bayesian linear inverse problems. InInternational Conference on Learning Representations, 2024. 11
2024
-
[41]
R. Singhal, Z. Horvitz, R. Teehan, M. Ren, Z. Yu, K. McKeown, and R. Ranganath. A general framework for inference-time scaling and steering of diffusion models, 2025. URL https://arxiv.org/abs/2501.06848
arXiv 2025
-
[42]
C. A. Naesseth, F. Lindsten, T. B. Schön, et al. Elements of sequential Monte Carlo.Foundations and Trends® in Machine Learning, 12(3):307–392, 2019
2019
-
[43]
Douc and O
R. Douc and O. Cappé. Comparison of resampling schemes for particle filtering. InISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, pages 64–69, 2005
2005
-
[44]
S. N. Ethier and T. G. Kurtz.Markov Processes: Characterization and Convergence. John Wiley & Sons, 2009
2009
-
[45]
Del Moral.Mean Field Simulation for Monte Carlo Integration
P. Del Moral.Mean Field Simulation for Monte Carlo Integration. Chapman and Hall, CRC press, 2013
2013
-
[46]
Rousset and G
M. Rousset and G. Stoltz. Equilibrium sampling from nonequilibrium dynamics.Journal of Statistical Physics, 123:1251–1272, 2006
2006
-
[47]
Angeli.Interacting particle approximations of Feynman-Kac measures for continuous-time jump processes
L. Angeli.Interacting particle approximations of Feynman-Kac measures for continuous-time jump processes. PhD thesis, University of Warwick, 2020
2020
-
[48]
Angeli, S
L. Angeli, S. Grosskinsky, A. M. Johansen, and A. Pizzoferrato. Rare event simulation for stochastic dynamics in continuous time.Journal of Statistical Physics, 176(5):1185–1210, 2019
2019
-
[49]
A. V . Fiacco and G. P. McCormick.Nonlinear Programming. Society for Industrial and Applied Mathematics, 1990. doi:10.1137/1.9781611971316. URL https://epubs.siam.org/doi/ abs/10.1137/1.9781611971316
-
[50]
Bertsekas.Nonlinear Programming
D. Bertsekas.Nonlinear Programming. 01 2003
2003
-
[51]
I. H. Dinwoodie. Large deviations techniques and applications (amir dembo and ofer zeitouni). SIAM Review, 36(2):303–304, 1994. doi:10.1137/1036078. URL https://doi.org/10. 1137/1036078
-
[52]
C.-R. Hwang. Laplace’s Method Revisited: Weak Convergence of Probability Measures. The Annals of Probability, 8(6):1177 – 1182, 1980. doi:10.1214/aop/1176994579. URL https://doi.org/10.1214/aop/1176994579. 12 Appendix Contents A Related Work 13 B Additional Background 14 C From Deterministic Flow to Equivalent SDE 16 D Resampling Methods 17 E The BayesFP ...
-
[53]
good” set Gδ := n x∈X:|h 1(x)| ≤δ, h 2(x)≤δ,L(x)≤ L ⋆ +δ o ,(59) and its complement (the “bad
for the FKC framework. Sequential Monte Carlo.Since our weights provide a proper weighting scheme for all intermediate distributions [40], we can leverage SMC techniques which reweight trajectories along their simulation. In practice, we find that resampling only over an ‘active interval’ t∈[t min, tmax] is useful for 17 improving sample quality and prese...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.