Sample-Efficient and Smooth Cross-Entropy Method Model Predictive Control Using Deterministic Samples

Daniel Frisch; Markus Walker; Uwe D. Hanebeck

arxiv: 2510.05706 · v2 · submitted 2025-10-07 · 📡 eess.SY · cs.SY

Sample-Efficient and Smooth Cross-Entropy Method Model Predictive Control Using Deterministic Samples

Markus Walker , Daniel Frisch , Uwe D. Hanebeck This is my paper

Pith reviewed 2026-05-18 09:37 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords deterministic samplingcross-entropy methodmodel predictive controlnonlinear controlsample efficiencycontrol smoothnesslocalized cumulative distributions

0 comments

The pith

Replacing random sampling with deterministic samples from localized cumulative distributions makes cross-entropy MPC more sample-efficient and produces smoother controls.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces deterministic sampling CEM, or dsCEM, to replace random sampling in cross-entropy method model predictive control. It generates samples using localized cumulative distributions with added temporal correlations to promote smoothness. This approach is designed as a direct substitute for the sampling step in existing controllers. Experiments on nonlinear tasks show better performance than standard methods especially when using few samples. A sympathetic reader would care because it addresses the inefficiency and jerkiness that have limited gradient-free control methods in practice.

Core claim

The central claim is that deterministic samples derived from localized cumulative distributions, adapted with modular schemes and incorporating temporal correlations, can replace random sampling in CEM-MPC to achieve lower cumulative costs and smoother control inputs with fewer samples.

What carries the argument

Localized cumulative distributions (LCDs) that generate deterministic sample sets with temporal correlations for smooth trajectories.

If this is right

dsCEM can be used as a drop-in replacement in existing CEM-based controllers.
It outperforms iCEM in cumulative cost and smoothness in low-sample regimes.
Reduces the need for large sample counts to achieve satisfactory results.
Incorporates temporal correlations directly into sample generation to ensure smooth control trajectories.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The deterministic approach might apply to other sampling-based optimization methods in robotics and planning.
Lower sample counts could enable real-time deployment on hardware with limited compute.
Similar localized distribution techniques could address efficiency-smoothness tradeoffs in related stochastic control settings.

Load-bearing premise

Samples drawn from the proposed localized cumulative distributions with added temporal correlations will maintain sufficient exploration of the solution space while guaranteeing smoothness without needing large sample counts.

What would settle it

A test where dsCEM fails to match or exceed iCEM performance on the same nonlinear control tasks in the low-sample regime, or produces less smooth inputs as measured by some metric.

Figures

Figures reproduced from arXiv: 2510.05706 by Daniel Frisch, Markus Walker, Uwe D. Hanebeck.

**Figure 2.** Figure 2: Example of 25 two-dimensional deterministic samples, where the background color indicates the PDF. We then obtain a general distance measure between two PDFs by comparing their respective LCDs with a modified Cramér–von Mises (CvM) distance [5] DCvM = Z ∞ 0 w(b) Z R dξ F˜( ¯ m, b) − F( ¯ m, b) 2 d ¯ m db , where F˜( ¯ m, b) and F( ¯ m, b) are the LCDs of ˜f( ¯ ξ) and f( ¯ ξ), respectively, and w(b) is a… view at source ↗

**Figure 3.** Figure 3: The results for the Mountain Car Task are given for [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Results for the cart-pole task. The different methods’ [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

Cross-entropy method model predictive control (CEM--MPC) is a powerful gradient-free technique for nonlinear optimal control, but its performance is often limited by the reliance on random sampling. This conventional approach can lead to inefficient exploration of the solution space and non-smooth control inputs, requiring a large number of samples to achieve satisfactory results. To address these limitations, we propose deterministic sampling CEM (dsCEM), a novel framework that replaces the random sampling step with deterministic samples derived from localized cumulative distributions (LCDs). Our approach introduces modular schemes to generate and adapt these sample sets, incorporating temporal correlations to ensure smooth control trajectories. This method can be used as a drop-in replacement for the sampling step in existing CEM-based controllers. Experimental evaluations on two nonlinear control tasks demonstrate that dsCEM consistently outperforms state-of-the-art iCEM in terms of cumulative cost and control input smoothness, particularly in the critical low-sample regime.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes deterministic sampling CEM (dsCEM) as a drop-in replacement for the random sampling step in CEM-MPC. It replaces random samples with deterministic samples drawn from localized cumulative distributions (LCDs), augmented by modular adaptation schemes and temporal correlations to promote smoothness. The central empirical claim is that dsCEM outperforms the state-of-the-art iCEM baseline on two nonlinear control tasks, with particular gains in cumulative cost and input smoothness in the low-sample regime.

Significance. If the reported gains hold under scrutiny, the approach could meaningfully lower the sample budget required for practical CEM-MPC while simultaneously improving trajectory smoothness. The modular construction of the LCD-based samplers is a positive design feature that facilitates integration into existing controllers.

major comments (2)

[Abstract and §4] Abstract and §4 (Experimental Evaluation): the manuscript asserts consistent outperformance on two tasks but supplies no numerical values for sample counts N, cost-function definitions, dynamics models, or statistical significance tests. Without these details it is impossible to determine whether the low-sample-regime gains are robust or the result of post-hoc tuning.
[§3] §3 (Deterministic Sampling via LCDs): the construction of localized cumulative distributions plus explicit temporal correlation necessarily reduces trajectory diversity relative to i.i.d. random sampling. No diversity metric, coverage bound, or ablation is provided to demonstrate that the effective support of the sampled set remains adequate once N falls below the regime where random sampling succeeds; this directly bears on whether the reported gains generalize or are task-specific.

minor comments (2)

[§3] Notation for the LCD adaptation rules should be made fully explicit (e.g., how the localization parameters are updated between CEM iterations).
[Figures] Figure captions should state the exact sample budgets N used in each panel so that the low-sample-regime claim can be verified at a glance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We respond to each major point below and will incorporate clarifications and additional analyses in the revised manuscript.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Experimental Evaluation): the manuscript asserts consistent outperformance on two tasks but supplies no numerical values for sample counts N, cost-function definitions, dynamics models, or statistical significance tests. Without these details it is impossible to determine whether the low-sample-regime gains are robust or the result of post-hoc tuning.

Authors: We agree that the presentation would benefit from greater explicitness. In the revised version we will state the exact sample budgets N used in each experiment, provide the full mathematical definitions of the cost functions, specify the nonlinear dynamics models for both tasks, and report statistical significance results (including p-values from paired t-tests) to confirm that the observed gains in cumulative cost and smoothness are robust rather than artifacts of post-hoc tuning. revision: yes
Referee: [§3] §3 (Deterministic Sampling via LCDs): the construction of localized cumulative distributions plus explicit temporal correlation necessarily reduces trajectory diversity relative to i.i.d. random sampling. No diversity metric, coverage bound, or ablation is provided to demonstrate that the effective support of the sampled set remains adequate once N falls below the regime where random sampling succeeds; this directly bears on whether the reported gains generalize or are task-specific.

Authors: The localized and temporally correlated sampling is deliberately designed to concentrate samples in promising regions while enforcing smoothness; this necessarily trades some i.i.d. diversity for efficiency. Nevertheless, the modular adaptation of the LCDs is intended to preserve adequate coverage. To substantiate this claim we will add, in the revised §3, a quantitative diversity metric (average minimum pairwise distance among samples) together with an ablation that isolates the effect of the temporal-correlation term, thereby showing that support remains sufficient for the low-N regime on the evaluated tasks. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on experimental comparison of proposed sampling method

full rationale

The paper introduces dsCEM as an algorithmic replacement for random sampling in CEM-MPC, using deterministic samples from localized cumulative distributions with added temporal correlations. Its central claims are supported by direct experimental comparisons on two nonlinear control tasks, showing improved cumulative cost and smoothness in the low-sample regime versus iCEM. No derivation chain, fitted-parameter prediction, or self-citation load-bearing step is present that reduces the result to its own inputs by construction. The approach is self-contained as a modular sampling scheme whose performance is externally validated through empirical evaluation rather than internal redefinition or renaming.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach introduces the new concept of localized cumulative distributions (LCDs) for sample generation and assumes that temporal correlation can be added without destroying the optimization properties of CEM; these are not standard background results and constitute the main added machinery.

axioms (1)

domain assumption Deterministic samples derived from localized cumulative distributions can replace random samples while preserving or improving exploration quality in CEM-MPC.
This premise is required for the performance claims but is not derived from first principles in the abstract.

invented entities (1)

Localized cumulative distributions (LCDs) no independent evidence
purpose: To generate deterministic sample sets for the cross-entropy update step.
New sampling mechanism introduced to replace random sampling; no independent evidence outside the paper is provided in the abstract.

pith-pipeline@v0.9.0 · 5696 in / 1389 out tokens · 29401 ms · 2026-05-18T09:37:56.029722+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

replaces the random sampling step with deterministic samples derived from localized cumulative distributions (LCDs)... incorporating temporal correlations to ensure smooth control trajectories
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Experimental evaluations on two nonlinear control tasks demonstrate that dsCEM consistently outperforms state-of-the-art iCEM

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 1 internal anchor

[1]

J. B. Rawlings, D. Q. Mayne, and M. Diehl,Model Predictive Control: Theory, Computation, and Design, 2nd ed. Madison, WI, USA: Nob Hill Publishing, 2017

work page 2017
[2]

Sample-efficient cross-entropy method for real-time planning,

C. Pinneri, S. Sawant, S. Blaes, J. Achterhold, J. Stueckler, M. Rolinek, and G. Martius, “Sample-efficient cross-entropy method for real-time planning,” inProceedings of the 2020 Conference on Robot Learning, vol. 155, Nov. 2021, pp. 1049–1065

work page 2020
[3]

The cross-entropy method for combinatorial and continuous optimization,

R. Rubinstein, “The cross-entropy method for combinatorial and continuous optimization,”Methodology And Computing In Applied Probability, vol. 1, no. 2, pp. 127–190, 1999

work page 1999
[4]

Deep rein- forcement learning in a handful of trials using probabilistic dynamics models,

K. Chua, R. Calandra, R. McAllister, and S. Levine, “Deep rein- forcement learning in a handful of trials using probabilistic dynamics models,” inProceedings of the 32nd International Conference on Neural Information Processing Systems, 2018, pp. 4759–4770

work page 2018
[5]

Localized cumulative distributions and a multivariate generalization of the Cramér-von Mises distance,

U. D. Hanebeck and V . Klumpp, “Localized cumulative distributions and a multivariate generalization of the Cramér-von Mises distance,” in Proceedings of the 2008 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI 2008), Seoul, Republic of Korea, August 2008, p. 33–39

work page 2008
[6]

Dirac mixture approxi- mation of multivariate Gaussian densities,

U. D. Hanebeck, M. F. Huber, and V . Klumpp, “Dirac mixture approxi- mation of multivariate Gaussian densities,” inProceedings of the 2009 IEEE Conference on Decision and Control (CDC 2009), Shanghai, China, December 2009

work page 2009
[7]

Fast direct mul- tiple shooting algorithms for optimal robot control,

M. Diehl, H. Bock, H. Diedam, and P.-B. Wieber, “Fast direct mul- tiple shooting algorithms for optimal robot control,” inFast Motions in Biomechanics and Robotics: Optimization and Feedback Control. Berlin, Heidelberg: Springer, 2006, pp. 65–93

work page 2006
[8]

D. H. Jacobson and D. Q. Mayne,Differential Dynamic Programming, ser. Modern Analytic and Computational Methods in Science and Mathematics. New York, NY: American Elsevier Publ, 1970, no. 24

work page 1970
[9]

Iterative linear quadratic regulator design for nonlinear biological movement systems,

W. Li and E. Todorov, “Iterative linear quadratic regulator design for nonlinear biological movement systems,” inProceedings of the First International Conference on Informatics in Control, Automation and Robotics, Setúbal, Portugal, 2004, pp. 222–229

work page 2004
[10]

Aggressive driving with model predictive path integral control,

G. Williams, P. Drews, B. Goldfain, J. M. Rehg, and E. A. Theodorou, “Aggressive driving with model predictive path integral control,” in Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 1433–1440

work page 2016
[11]

A tutorial on the cross-entropy method,

P.-T. De Boer, D. P. Kroese, S. Mannor, and R. Y . Rubinstein, “A tutorial on the cross-entropy method,”Annals of Operations Research, vol. 134, no. 1, pp. 19–67, Feb. 2005

work page 2005
[12]

Learning latent dynamics for planning from pixels,

D. Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, and J. Davidson, “Learning latent dynamics for planning from pixels,” in Proceedings of the 36th International Conference on Machine Learning, vol. 97, Jun. 2019, pp. 2555–2565

work page 2019
[13]

Inferring smooth control: Monte Carlo posterior policy iteration with Gaussian processes,

J. Watson and J. Peters, “Inferring smooth control: Monte Carlo posterior policy iteration with Gaussian processes,” inProceedings of the 6th Conference on Robot Learning, vol. 205, Dec. 2023, pp. 67–79

work page 2023
[14]

Learning a generalizable trajectory sam- pling distribution for model predictive control,

T. Power and D. Berenson, “Learning a generalizable trajectory sam- pling distribution for model predictive control,”IEEE Transactions on Robotics, vol. 40, pp. 2111–2127, 2024

work page 2024
[15]

S2kf: The smart sampling Kalman filter,

J. Steinbring and U. D. Hanebeck, “S2kf: The smart sampling Kalman filter,” inProceedings of the 16th International Conference on Infor- mation Fusion (Fusion 2013), Istanbul, Turkey, July 2013

work page 2013
[16]

LRKF revisited: The smart sampling Kalman filter (S2KF),

——, “LRKF revisited: The smart sampling Kalman filter (S2KF),” Journal of Advances in Information Fusion, vol. 9, no. 2, pp. 106–123, December 2014

work page 2014
[17]

The smart sampling Kalman filter with symmetric samples,

J. Steinbring, M. Pander, and U. D. Hanebeck, “The smart sampling Kalman filter with symmetric samples,”Journal of Advances in Infor- mation Fusion, vol. 11, no. 1, pp. 71–90, June 2016

work page 2016
[18]

A statistical model for random rotations,

C. A. León, J.-C. Massé, and L.-P. Rivest, “A statistical model for random rotations,”Journal of Multivariate Analysis, vol. 97, no. 2, pp. 412–430, 2006

work page 2006
[19]

S. M. Kay,Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice-Hall, Inc., 1993

work page 1993
[20]

Reduced sigma point filters for the propa- gation of means and covariances through nonlinear transformations,

S. Julier and J. Uhlmann, “Reduced sigma point filters for the propa- gation of means and covariances through nonlinear transformations,” inProceedings of the 2002 American Control Conference, Anchorage, AK, USA, 2002, pp. 887–892

work page 2002
[21]

Gymnasium: A Standard Interface for Reinforcement Learning Environments

M. Towers, A. Kwiatkowski, J. Terry, Baliset al., “Gymnasium: a standard interface for reinforcement learning environments,”arXiv preprint:2407.17032, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[22]

Neuronlike adaptive elements that can solve difficult learning control problems,

A. G. Barto, R. S. Sutton, and C. W. Anderson, “Neuronlike adaptive elements that can solve difficult learning control problems,”IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-13, no. 5, pp. 834–846, 1983

work page 1983

[1] [1]

J. B. Rawlings, D. Q. Mayne, and M. Diehl,Model Predictive Control: Theory, Computation, and Design, 2nd ed. Madison, WI, USA: Nob Hill Publishing, 2017

work page 2017

[2] [2]

Sample-efficient cross-entropy method for real-time planning,

C. Pinneri, S. Sawant, S. Blaes, J. Achterhold, J. Stueckler, M. Rolinek, and G. Martius, “Sample-efficient cross-entropy method for real-time planning,” inProceedings of the 2020 Conference on Robot Learning, vol. 155, Nov. 2021, pp. 1049–1065

work page 2020

[3] [3]

The cross-entropy method for combinatorial and continuous optimization,

R. Rubinstein, “The cross-entropy method for combinatorial and continuous optimization,”Methodology And Computing In Applied Probability, vol. 1, no. 2, pp. 127–190, 1999

work page 1999

[4] [4]

Deep rein- forcement learning in a handful of trials using probabilistic dynamics models,

K. Chua, R. Calandra, R. McAllister, and S. Levine, “Deep rein- forcement learning in a handful of trials using probabilistic dynamics models,” inProceedings of the 32nd International Conference on Neural Information Processing Systems, 2018, pp. 4759–4770

work page 2018

[5] [5]

Localized cumulative distributions and a multivariate generalization of the Cramér-von Mises distance,

U. D. Hanebeck and V . Klumpp, “Localized cumulative distributions and a multivariate generalization of the Cramér-von Mises distance,” in Proceedings of the 2008 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI 2008), Seoul, Republic of Korea, August 2008, p. 33–39

work page 2008

[6] [6]

Dirac mixture approxi- mation of multivariate Gaussian densities,

U. D. Hanebeck, M. F. Huber, and V . Klumpp, “Dirac mixture approxi- mation of multivariate Gaussian densities,” inProceedings of the 2009 IEEE Conference on Decision and Control (CDC 2009), Shanghai, China, December 2009

work page 2009

[7] [7]

Fast direct mul- tiple shooting algorithms for optimal robot control,

M. Diehl, H. Bock, H. Diedam, and P.-B. Wieber, “Fast direct mul- tiple shooting algorithms for optimal robot control,” inFast Motions in Biomechanics and Robotics: Optimization and Feedback Control. Berlin, Heidelberg: Springer, 2006, pp. 65–93

work page 2006

[8] [8]

D. H. Jacobson and D. Q. Mayne,Differential Dynamic Programming, ser. Modern Analytic and Computational Methods in Science and Mathematics. New York, NY: American Elsevier Publ, 1970, no. 24

work page 1970

[9] [9]

Iterative linear quadratic regulator design for nonlinear biological movement systems,

W. Li and E. Todorov, “Iterative linear quadratic regulator design for nonlinear biological movement systems,” inProceedings of the First International Conference on Informatics in Control, Automation and Robotics, Setúbal, Portugal, 2004, pp. 222–229

work page 2004

[10] [10]

Aggressive driving with model predictive path integral control,

G. Williams, P. Drews, B. Goldfain, J. M. Rehg, and E. A. Theodorou, “Aggressive driving with model predictive path integral control,” in Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 1433–1440

work page 2016

[11] [11]

A tutorial on the cross-entropy method,

P.-T. De Boer, D. P. Kroese, S. Mannor, and R. Y . Rubinstein, “A tutorial on the cross-entropy method,”Annals of Operations Research, vol. 134, no. 1, pp. 19–67, Feb. 2005

work page 2005

[12] [12]

Learning latent dynamics for planning from pixels,

D. Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, and J. Davidson, “Learning latent dynamics for planning from pixels,” in Proceedings of the 36th International Conference on Machine Learning, vol. 97, Jun. 2019, pp. 2555–2565

work page 2019

[13] [13]

Inferring smooth control: Monte Carlo posterior policy iteration with Gaussian processes,

J. Watson and J. Peters, “Inferring smooth control: Monte Carlo posterior policy iteration with Gaussian processes,” inProceedings of the 6th Conference on Robot Learning, vol. 205, Dec. 2023, pp. 67–79

work page 2023

[14] [14]

Learning a generalizable trajectory sam- pling distribution for model predictive control,

T. Power and D. Berenson, “Learning a generalizable trajectory sam- pling distribution for model predictive control,”IEEE Transactions on Robotics, vol. 40, pp. 2111–2127, 2024

work page 2024

[15] [15]

S2kf: The smart sampling Kalman filter,

J. Steinbring and U. D. Hanebeck, “S2kf: The smart sampling Kalman filter,” inProceedings of the 16th International Conference on Infor- mation Fusion (Fusion 2013), Istanbul, Turkey, July 2013

work page 2013

[16] [16]

LRKF revisited: The smart sampling Kalman filter (S2KF),

——, “LRKF revisited: The smart sampling Kalman filter (S2KF),” Journal of Advances in Information Fusion, vol. 9, no. 2, pp. 106–123, December 2014

work page 2014

[17] [17]

The smart sampling Kalman filter with symmetric samples,

J. Steinbring, M. Pander, and U. D. Hanebeck, “The smart sampling Kalman filter with symmetric samples,”Journal of Advances in Infor- mation Fusion, vol. 11, no. 1, pp. 71–90, June 2016

work page 2016

[18] [18]

A statistical model for random rotations,

C. A. León, J.-C. Massé, and L.-P. Rivest, “A statistical model for random rotations,”Journal of Multivariate Analysis, vol. 97, no. 2, pp. 412–430, 2006

work page 2006

[19] [19]

S. M. Kay,Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice-Hall, Inc., 1993

work page 1993

[20] [20]

Reduced sigma point filters for the propa- gation of means and covariances through nonlinear transformations,

S. Julier and J. Uhlmann, “Reduced sigma point filters for the propa- gation of means and covariances through nonlinear transformations,” inProceedings of the 2002 American Control Conference, Anchorage, AK, USA, 2002, pp. 887–892

work page 2002

[21] [21]

Gymnasium: A Standard Interface for Reinforcement Learning Environments

M. Towers, A. Kwiatkowski, J. Terry, Baliset al., “Gymnasium: a standard interface for reinforcement learning environments,”arXiv preprint:2407.17032, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[22] [22]

Neuronlike adaptive elements that can solve difficult learning control problems,

A. G. Barto, R. S. Sutton, and C. W. Anderson, “Neuronlike adaptive elements that can solve difficult learning control problems,”IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-13, no. 5, pp. 834–846, 1983

work page 1983