Model Predictive Control via Probabilistic Inference: A Tutorial and Survey

Kohei Honda

arxiv: 2511.08019 · v4 · submitted 2025-11-11 · 💻 cs.RO · cs.SY· eess.SY

Model Predictive Control via Probabilistic Inference: A Tutorial and Survey

Kohei Honda This is my paper

Pith reviewed 2026-05-17 23:58 UTC · model grok-4.3

classification 💻 cs.RO cs.SYeess.SY

keywords model predictive controlprobabilistic inferencevariational inferenceoptimal controlroboticsBoltzmann distributionMPPIsampling-based control

0 comments

The pith

Finite-horizon optimal control can be reformulated as variational inference over a Boltzmann distribution of optimal actions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that model predictive control problems can be solved by recasting the search for optimal actions as inference over a probability distribution. It derives how this distribution takes the form of a Boltzmann distribution that weights costs against a prior over controls. Actions are generated by applying variational inference to draw samples from this distribution. The approach unifies existing methods such as Model Predictive Path Integral control, which admits a closed-form sampling update. A reader would care because the reformulation provides a single conceptual frame for handling costs, priors, constraints, and multi-modality in robotics tasks.

Core claim

PI-MPC reformulates finite-horizon optimal control as inference over an optimal control distribution expressed as a Boltzmann distribution weighted by a control prior, and generates actions through variational inference. The tutorial part derives the formulation and shows how Model Predictive Path Integral control arises as a representative case with a closed-form sampling update. The survey part organizes existing work along design dimensions that include prior design, multi-modality, constraint handling, scalability, hardware acceleration, and theoretical analysis.

What carries the argument

The Boltzmann distribution over controls that encodes the optimal control objective as an energy weighted by a control prior, with variational inference used to generate actions.

Load-bearing premise

The original optimal control objective can be faithfully captured by inference in a Boltzmann distribution without introducing significant bias or losing key optimality properties of the underlying problem.

What would settle it

Direct comparison of achieved costs and constraint violations between PI-MPC and a standard numerical MPC solver on the same finite-horizon benchmark problems would test whether the inference step preserves the original optimality properties.

Figures

Figures reproduced from arXiv: 2511.08019 by Kohei Honda.

**Figure 2.** Figure 2: : A comparison of sampling-based MPC methods in a vehicle obstacle avoid [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: : Graphical model representation of the optimal control problem. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: : Cost function and Boltzmann distribution. The Boltzmann distribution [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: : Variation of the optimal control distribution with temperature parameter [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: : Example of decision-making through symmetry breaking. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: : Variation of the optimal control distribution with the prior distribution. The [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

read the original abstract

This paper presents a tutorial and survey on Probabilistic Inference-based Model Predictive Control (PI-MPC). PI-MPC reformulates finite-horizon optimal control as inference over an optimal control distribution expressed as a Boltzmann distribution weighted by a control prior, and generates actions through variational inference. In the tutorial part, we derive this formulation and explain action generation via variational inference, highlighting Model Predictive Path Integral (MPPI) control as a representative algorithm with a closed-form sampling update. In the survey part, we organize existing PI-MPC research around key design dimensions, including prior design, multi-modality, constraint handling, scalability, hardware acceleration, and theoretical analysis. This paper provides a unified conceptual perspective on PI-MPC and a practical entry point for researchers and practitioners in robotics and other control applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a clear tutorial and survey that organizes PI-MPC literature around design dimensions but introduces no new results or theorems.

read the letter

The paper is a tutorial and survey on Probabilistic Inference-based Model Predictive Control. It shows how to recast finite-horizon optimal control as inference over a Boltzmann distribution that includes a control prior, then generates actions with variational inference. MPPI appears as the running example with its closed-form sampling step. The survey section groups prior work by prior design, multi-modality, constraint handling, scalability, hardware acceleration, and theoretical analysis. That structure is the main contribution. It gives readers a single map for comparing methods instead of piecing together scattered papers. The derivations stay faithful to existing MPPI-style work and do not contain obvious errors or unsupported steps. The high-level framing is consistent. Soft spots are limited. Any survey stands or falls on coverage, so the real test is whether the reference list and categorization catch the main variants in constraint handling and multi-modal cases. The abstract gives no sign of major omissions or misclassifications, but a referee could still check for recent edge cases that got left out. This paper suits researchers and practitioners who are new to the probabilistic view of MPC or who need a practical reference when choosing or extending an algorithm. It is less useful for experts already deep in the citations. The work deserves peer review because a well-organized entry point like this can save time for people entering the area even without new theorems. I would send it out for referee comments.

Referee Report

1 major / 2 minor

Summary. The paper presents a tutorial and survey on Probabilistic Inference-based Model Predictive Control (PI-MPC). It reformulates finite-horizon optimal control as inference over an optimal control distribution expressed as a Boltzmann distribution weighted by a control prior, with actions generated through variational inference. The tutorial derives the formulation and presents Model Predictive Path Integral (MPPI) control as a representative algorithm with a closed-form sampling update. The survey organizes existing work around design dimensions including prior design, multi-modality, constraint handling, scalability, hardware acceleration, and theoretical analysis, providing a unified conceptual perspective for robotics and control applications.

Significance. If the categorization of the literature is accurate and reasonably comprehensive, this work offers a useful service by unifying an established line of research at the intersection of optimal control and probabilistic inference. The explicit tutorial derivation and the organization by design dimensions provide a practical on-ramp for newcomers while highlighting open questions in areas such as constraint handling and theoretical guarantees. These features could help consolidate the field and guide subsequent algorithm development.

major comments (1)

[Tutorial part] Tutorial derivation (around the Boltzmann-distribution reformulation): the equivalence between the original optimal-control objective and inference in the weighted Boltzmann distribution is presented as direct, yet the discussion of conditions under which this mapping preserves key optimality properties (especially for non-convex or state-dependent costs) remains high-level; a concrete statement of the approximation error or the role of the temperature parameter would strengthen the central framing.

minor comments (2)

[Survey part] A summary table listing the surveyed design dimensions together with representative algorithms and key references would improve navigability for readers.
Notation for the control prior and the variational distribution is introduced gradually; consolidating the main symbols in a single early table or list would aid readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive assessment of the paper and for the constructive suggestion regarding the tutorial derivation. We address the comment below and agree to strengthen the presentation in the revised manuscript.

read point-by-point responses

Referee: [Tutorial part] Tutorial derivation (around the Boltzmann-distribution reformulation): the equivalence between the original optimal-control objective and inference in the weighted Boltzmann distribution is presented as direct, yet the discussion of conditions under which this mapping preserves key optimality properties (especially for non-convex or state-dependent costs) remains high-level; a concrete statement of the approximation error or the role of the temperature parameter would strengthen the central framing.

Authors: We thank the referee for highlighting this point. The tutorial presents the Boltzmann reformulation as an exact equivalence to the maximum-entropy optimal-control objective when the cost is additive and the prior is chosen to encode control effort; this is standard in the PI-MPC literature and holds without approximation for the soft-optimal distribution. For non-convex or state-dependent costs the mapping remains formally valid for the soft objective, but the subsequent variational inference step (especially in sampling-based realizations such as MPPI) introduces a practical approximation whose error depends on sample count and temperature. The temperature parameter explicitly trades off optimality and exploration and can be viewed as a regularization strength; lower values recover harder optimality at the cost of higher variance in the sampling estimator. We agree that a concise paragraph making these distinctions explicit, together with a brief reference to existing error bounds in the variational inference literature, would improve clarity. We will add this discussion in the revised tutorial section. revision: yes

Circularity Check

0 steps flagged

Tutorial and survey with no load-bearing circular derivations

full rationale

This is a tutorial and survey paper that reformulates MPC as probabilistic inference by referencing established prior literature on MPPI and variational inference. The derivation in the tutorial section follows standard Boltzmann distribution and VI steps from external sources rather than reducing to any fitted parameter, self-definition, or self-citation chain within the paper. No equation or claim is shown to be equivalent to its inputs by construction. Minor self-citations are present but not load-bearing for the central organization of design dimensions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is primarily a survey and tutorial; it draws on standard probabilistic inference techniques already established in the literature and introduces no new free parameters, axioms, or invented entities in the abstract description.

axioms (1)

domain assumption Variational inference provides a tractable approximation to the optimal control distribution
Invoked in the tutorial section for generating actions from the Boltzmann distribution.

pith-pipeline@v0.9.0 · 5425 in / 1163 out tokens · 33242 ms · 2026-05-17T23:58:19.457585+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

π∗(u0:T−1) = Z−1 exp(−λ−1 Jτ(u0:T−1)) p(u0:T−1) (Boltzmann distribution weighted by prior)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Derivation via graphical model, optimality likelihood, and KL minimization (Eqs. 3–5)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Reset-Free Reinforcement Learning for Real-World Agile Driving: An Empirical Study
cs.RO 2026-04 unverdicted novelty 5.0

Empirical comparison shows a clear sim-to-real gap in reset-free RL for agile driving: TD-MPC2 outperforms the MPPI baseline in the real world while SAC excels in simulation, and residual learning benefits simulation ...

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Path integrals and symmetry breaking for optimal control theory

Hilbert J Kappen. Path integrals and symmetry breaking for optimal control theory. Journal of statistical mechanics: theory and experiment, 2005(11):P11011, 2005

work page 2005
[2]

Information-theoretic model predictive control: Theory and applications to autonomous driving.IEEE Transactions on Robotics, 34(6):1603–1622, 2018

Grady Williams, Paul Drews, Brian Goldfain, James M Rehg, and Evangelos A Theodorou. Information-theoretic model predictive control: Theory and applications to autonomous driving.IEEE Transactions on Robotics, 34(6):1603–1622, 2018

work page 2018
[3]

Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review

Sergey Levine. Reinforcement learning and control as probabilistic inference: Tutorial and review.arXiv preprint arXiv:1805.00909, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[4]

Variational Inference MPC using Tsallis Divergence

Ziyi Wang, Oswin So, Jason Gibson, Bogdan Vlahov, Manan Gandhi, Guan-Horng Liu, and Evangelos Theodorou. Variational Inference MPC using Tsallis Divergence. InPro- ceedings of Robotics: Science and Systems, July 2021

work page 2021
[5]

Maximum a posteriori policy optimisation

Abbas Abdolmaleki, Jost Tobias Springenberg, Yuval Tassa, Remi Munos, Nicolas Heess, and Martin Riedmiller. Maximum a posteriori policy optimisation. InInternational Conference on Learning Representations. PMLR, 2018

work page 2018
[6]

Stein variational guided model predictive path integral control: Proposal and experiments with fast maneuvering vehicles

Kohei Honda, Naoki Akai, Kosuke Suzuki, Mizuho Aoki, Hirotaka Hosogaya, Hiroyuki Okuda, and Tatsuya Suzuki. Stein variational guided model predictive path integral control: Proposal and experiments with fast maneuvering vehicles. InInternational Con- ference on Robotics and Automation, pages 7020–7026. IEEE, 2024

work page 2024
[7]

Temporal difference learning for model predictive control

N Hansen, X Wang, and H Su. Temporal difference learning for model predictive control. InInternational Conference on Machine Learning. PMLR, 2022

work page 2022
[8]

Variational inference MPC for Bayesian model- based reinforcement learning

Masashi Okada and Tadahiro Taniguchi. Variational inference MPC for Bayesian model- based reinforcement learning. InConference on robot learning, pages 258–272. PMLR, 2020

work page 2020
[9]

Direct preference optimization: Your language model is secretly a reward model.Advances in Neural Information Processing Systems, 36:53728–53741, 2023

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model.Advances in Neural Information Processing Systems, 36:53728–53741, 2023

work page 2023
[10]

The cross- entropy method for optimization

Zdravko I Botev, Dirk P Kroese, Reuven Y Rubinstein, and Pierre L’ecuyer. The cross- entropy method for optimization. InHandbook of statistics, volume 31, pages 35–59. Elsevier, 2013

work page 2013
[11]

A tutorial on energy-based learning.Predicting Structured Data, 1:0, 2006

Yann LeCun, Sumit Chopra, Raia Hadsell, Marc’Aurelio Ranzato, and Fu Jie Huang. A tutorial on energy-based learning.Predicting Structured Data, 1:0, 2006

work page 2006
[12]

MPPI Playground: Model predictive path integral control with pytorch

Kohei Honda. MPPI Playground: Model predictive path integral control with pytorch. https://github.com/kohonda/mppi_playground

work page
[13]

CoVO-MPC: Theoretical analysis of sampling-based mpc and optimal covariance design

Zeji Yi, Chaoyi Pan, Guanqi He, Guannan Qu, and Guanya Shi. CoVO-MPC: Theoretical analysis of sampling-based mpc and optimal covariance design. InLearning for Dynamics & Control Conference, pages 1122–1135. PMLR, 2024. 14

work page 2024
[14]

Variational inference mpc using normalizing flows and out-of-distribution projection.Robotics: Science and Systems, 2022

Thomas Power and Dmitry Berenson. Variational inference mpc using normalizing flows and out-of-distribution projection.Robotics: Science and Systems, 2022

work page 2022
[15]

Stein variational model predictive control

Alexander Lambert, Fabio Ramos, Byron Boots, Dieter Fox, and Adam Fishman. Stein variational model predictive control. InConference on Robot Learning, pages 1278–1297. PMLR, 2021

work page 2021
[16]

Real-time sampling-based model predictive con- trol based on reverse kullback-leibler divergence and its adaptive acceleration.arXiv preprint arXiv:2212.04298, 2022

Taisuke Kobayashi and Kota Fukumoto. Real-time sampling-based model predictive con- trol based on reverse kullback-leibler divergence and its adaptive acceleration.arXiv preprint arXiv:2212.04298, 2022

work page arXiv 2022
[17]

Full-order sampling-based MPC for torque-level locomotion control via diffusion-style annealing

Haoru Xue, Chaoyi Pan, Zeji Yi, Guannan Qu, and Guanya Shi. Full-order sampling-based MPC for torque-level locomotion control via diffusion-style annealing. InInternational Conference on Robotics and Automation, pages 4974–4981. IEEE, 2025. 15

work page 2025

[1] [1]

Path integrals and symmetry breaking for optimal control theory

Hilbert J Kappen. Path integrals and symmetry breaking for optimal control theory. Journal of statistical mechanics: theory and experiment, 2005(11):P11011, 2005

work page 2005

[2] [2]

Information-theoretic model predictive control: Theory and applications to autonomous driving.IEEE Transactions on Robotics, 34(6):1603–1622, 2018

Grady Williams, Paul Drews, Brian Goldfain, James M Rehg, and Evangelos A Theodorou. Information-theoretic model predictive control: Theory and applications to autonomous driving.IEEE Transactions on Robotics, 34(6):1603–1622, 2018

work page 2018

[3] [3]

Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review

Sergey Levine. Reinforcement learning and control as probabilistic inference: Tutorial and review.arXiv preprint arXiv:1805.00909, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[4] [4]

Variational Inference MPC using Tsallis Divergence

Ziyi Wang, Oswin So, Jason Gibson, Bogdan Vlahov, Manan Gandhi, Guan-Horng Liu, and Evangelos Theodorou. Variational Inference MPC using Tsallis Divergence. InPro- ceedings of Robotics: Science and Systems, July 2021

work page 2021

[5] [5]

Maximum a posteriori policy optimisation

Abbas Abdolmaleki, Jost Tobias Springenberg, Yuval Tassa, Remi Munos, Nicolas Heess, and Martin Riedmiller. Maximum a posteriori policy optimisation. InInternational Conference on Learning Representations. PMLR, 2018

work page 2018

[6] [6]

Stein variational guided model predictive path integral control: Proposal and experiments with fast maneuvering vehicles

Kohei Honda, Naoki Akai, Kosuke Suzuki, Mizuho Aoki, Hirotaka Hosogaya, Hiroyuki Okuda, and Tatsuya Suzuki. Stein variational guided model predictive path integral control: Proposal and experiments with fast maneuvering vehicles. InInternational Con- ference on Robotics and Automation, pages 7020–7026. IEEE, 2024

work page 2024

[7] [7]

Temporal difference learning for model predictive control

N Hansen, X Wang, and H Su. Temporal difference learning for model predictive control. InInternational Conference on Machine Learning. PMLR, 2022

work page 2022

[8] [8]

Variational inference MPC for Bayesian model- based reinforcement learning

Masashi Okada and Tadahiro Taniguchi. Variational inference MPC for Bayesian model- based reinforcement learning. InConference on robot learning, pages 258–272. PMLR, 2020

work page 2020

[9] [9]

Direct preference optimization: Your language model is secretly a reward model.Advances in Neural Information Processing Systems, 36:53728–53741, 2023

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model.Advances in Neural Information Processing Systems, 36:53728–53741, 2023

work page 2023

[10] [10]

The cross- entropy method for optimization

Zdravko I Botev, Dirk P Kroese, Reuven Y Rubinstein, and Pierre L’ecuyer. The cross- entropy method for optimization. InHandbook of statistics, volume 31, pages 35–59. Elsevier, 2013

work page 2013

[11] [11]

A tutorial on energy-based learning.Predicting Structured Data, 1:0, 2006

Yann LeCun, Sumit Chopra, Raia Hadsell, Marc’Aurelio Ranzato, and Fu Jie Huang. A tutorial on energy-based learning.Predicting Structured Data, 1:0, 2006

work page 2006

[12] [12]

MPPI Playground: Model predictive path integral control with pytorch

Kohei Honda. MPPI Playground: Model predictive path integral control with pytorch. https://github.com/kohonda/mppi_playground

work page

[13] [13]

CoVO-MPC: Theoretical analysis of sampling-based mpc and optimal covariance design

Zeji Yi, Chaoyi Pan, Guanqi He, Guannan Qu, and Guanya Shi. CoVO-MPC: Theoretical analysis of sampling-based mpc and optimal covariance design. InLearning for Dynamics & Control Conference, pages 1122–1135. PMLR, 2024. 14

work page 2024

[14] [14]

Variational inference mpc using normalizing flows and out-of-distribution projection.Robotics: Science and Systems, 2022

Thomas Power and Dmitry Berenson. Variational inference mpc using normalizing flows and out-of-distribution projection.Robotics: Science and Systems, 2022

work page 2022

[15] [15]

Stein variational model predictive control

Alexander Lambert, Fabio Ramos, Byron Boots, Dieter Fox, and Adam Fishman. Stein variational model predictive control. InConference on Robot Learning, pages 1278–1297. PMLR, 2021

work page 2021

[16] [16]

Real-time sampling-based model predictive con- trol based on reverse kullback-leibler divergence and its adaptive acceleration.arXiv preprint arXiv:2212.04298, 2022

Taisuke Kobayashi and Kota Fukumoto. Real-time sampling-based model predictive con- trol based on reverse kullback-leibler divergence and its adaptive acceleration.arXiv preprint arXiv:2212.04298, 2022

work page arXiv 2022

[17] [17]

Full-order sampling-based MPC for torque-level locomotion control via diffusion-style annealing

Haoru Xue, Chaoyi Pan, Zeji Yi, Guannan Qu, and Guanya Shi. Full-order sampling-based MPC for torque-level locomotion control via diffusion-style annealing. InInternational Conference on Robotics and Automation, pages 4974–4981. IEEE, 2025. 15

work page 2025