Safety-Critical Contextual Control via Online Riemannian Optimization with World Models

Tongxin Li

arxiv: 2604.19639 · v1 · submitted 2026-04-21 · 📡 eess.SY · cs.AI· cs.SY

Safety-Critical Contextual Control via Online Riemannian Optimization with World Models

Tongxin Li This is my paper

Pith reviewed 2026-05-10 01:50 UTC · model grok-4.3

classification 📡 eess.SY cs.AIcs.SY

keywords safety-critical controlcontextual optimizationRiemannian geometryworld modelspenalized predictive controlfeasibility manifoldscore-based densitybarrier curvature

0 comments

The pith

A score-based density from black-box feasibility samples endows the action space with a Riemannian geometry that bounds how far optimized controls can stray from the true safe set.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a way to keep control actions safe when the underlying dynamics are too complex for any explicit model. Instead of equations, the planner receives only samples of feasible actions from a simulator, conditioned on a context signal such as time or observed state. These samples are compressed into a conditional density that defines a curved geometry on the space of possible actions. Gradient steps are taken in that geometry, and the minimum curvature of the log-density sets both how fast the planner converges and how large a safety margin it maintains. The central guarantee is a bound on the distance to the true feasibility manifold that shrinks as the context becomes richer and the density estimate improves.

Core claim

By turning feasibility samples into a score-based density p̂(u∣ξt), the method performs online Riemannian optimization on the action space; the minimum curvature κ(ξt) of the barrier −ln p̂(·∣ξt) simultaneously governs convergence speed and safety margin, yielding a contextual safety bound in which the distance to the true feasibility manifold is controlled by the score estimation error and a ratio depending on κ(ξt), both of which tighten with richer context.

What carries the argument

The conditional score-based density p̂(u∣ξt) that defines a Riemannian metric on the action space, with its log-density minimum curvature κ(ξt) acting as the single parameter that replaces unknown Lipschitz constants and controls both optimization and safety.

If this is right

The distance from the optimized action to the true feasibility manifold decreases as score estimation error falls and as the curvature ratio improves with richer context.
Contextual penalized predictive control outperforms both marginal and frozen density baselines, with the performance gap widening after environment shifts.
The barrier curvature κ(ξt) determines convergence rate without any explicit knowledge of the underlying dynamics.
Safety margins are preserved even when the world model is used only through feasibility samples rather than full trajectories.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same density-to-geometry construction could be reused in other black-box planners where only feasible/infeasible labels are available, such as motion planning under sensor noise.
If the context signal is expanded to include predicted future states, the safety bound may tighten further without changing the optimization loop.
The curvature-based safety margin offers a concrete diagnostic: when κ(ξt) drops, the controller can automatically request more simulator samples before proceeding.

Load-bearing premise

Feasibility samples drawn from the black-box simulator can be turned into an accurate score-based density whose curvature faithfully reflects the geometry of the true feasible set.

What would settle it

In the dynamic navigation task, measure the actual Euclidean distance of the planner's output to the nearest infeasible action while also computing the score estimation error; if the observed distance consistently exceeds the bound predicted from the error and κ ratio, the safety claim fails.

Figures

Figures reproduced from arXiv: 2604.19639 by Tongxin Li.

**Figure 1.** Figure 1: Two closed-loop control paradigms. (a) Classical model-based control requires an explicit dynamics model ft to form constraint Jacobians and a CBF-QP safety filter that projects the controller’s action onto the safe set. (b) The contextual control framework replaces the explicit model with a black-box Simulator that produces feasibility samples; a KDE compresses them into a score signal sˆt , which the Pla… view at source ↗

**Figure 2.** Figure 2: The Simulator–Planner architecture. The Simulator [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Per-step information flow (Algorithms 1–2). The Simulator draws [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Contextual adaptation under obstacle reshuffle ( [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Contextual observation pipeline at the most constraining warmed-up step of two structurally opposite obstacle modes (top: [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Main comparison across all methods (T=1000, N=300 feasibility samples per step, five seeds, mean ± std); see Table I. The trajectory figure ( [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Stiffness ablation validating Proposition 2. The critical [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: Free-energy landscape at t=200. (a) Cost c(u) with true manifold boundary (blue) and learned Mˆ α t (cyan dashed). (b) Free energy F(u) with the PPC equilibrium u ∗ (red) and density maximizer u¯ (cyan). (c) Geometric gaps: the empirical ∥u ∗−u¯∥ is well within the theoretical bound Gc/(β κ) from Proposition 2, and dist(u ∗ ,∂Mt) > 0 confirms the equilibrium is safely interior. 10 1 10 2 10 3 Manifold Samp… view at source ↗

**Figure 9.** Figure 9: Effect of sample budget on safety and score estimation [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

**Figure 10.** Figure 10: Scalability with number of obstacles (Ko ∈ {3,5,10,15,20}, three seeds, T=300); see Table III. (a) PPC degrades gracefully (0.92 → 0.85) while CBF-QP drops to 0.40 and CEM drops to 0.62. (b) Total tracking cost. (c) Wall-clock time per step. 10 0 Speed Multiplier mult 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Normalized Cost (Oracle = 1) (a) Cost vs. Speed PPC Norm. Cost 0 200 400 600 800 1000 1200 Manifold Path Length… view at source ↗

**Figure 11.** Figure 11: Dynamic regret experiment validating Theorem 2 ( [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗

**Figure 12.** Figure 12: Contextual control ablation (five seeds; [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗

read the original abstract

Modern world models are becoming too complex to admit explicit dynamical descriptions. We study safety-critical contextual control, where a Planner must optimize a task objective using only feasibility samples from a black-box Simulator, conditioned on a context signal $\xi_t$. We develop a sample-based Penalized Predictive Control (PPC) framework grounded in online Riemannian optimization, in which the Simulator compresses the feasibility manifold into a score-based density $\hat{p}(u \mid \xi_t)$ that endows the action space with a Riemannian geometry guiding the Planner's gradient descent. The barrier curvature $\kappa(\xi_t)$, the minimum curvature of the conditional log-density $-\ln\hat{p}(\cdot\mid\xi_t)$, governs both convergence rate and safety margin, replacing the Lipschitz constant of the unknown dynamics. Our main result is a contextual safety bound showing that the distance from the true feasibility manifold is controlled by the score estimation error and a ratio that depends on $\kappa(\xi_t)$, both of which improve with richer context. Simulations on a dynamic navigation task confirm that contextual PPC substantially outperforms marginal and frozen density models, with the advantage growing after environment shifts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a PPC framework that uses online Riemannian optimization on score-based densities from black-box feasibility samples to get a curvature-dependent contextual safety bound, but the bound's validity rests on assumptions about density estimation that look shaky with binary labels.

read the letter

The main takeaway is that this work tries to handle safety in contextual control without explicit dynamics by turning simulator feasibility samples into a Riemannian metric via a score-based density, then using the minimum curvature of the log-density as a stand-in for Lipschitz constants in both optimization and safety margins. The simulations on a navigation task show the contextual version pulling ahead of marginal or frozen models, especially after environment changes, which is a concrete plus for the idea of letting richer context tighten the bound through lower score error and higher κ.

Referee Report

2 major / 2 minor

Summary. The paper develops a Penalized Predictive Control (PPC) framework for safety-critical contextual control with complex world models. Feasibility samples from a black-box simulator, conditioned on context ξ_t, are compressed into a score-based density estimate p̂(u∣ξ_t) that induces a Riemannian geometry on the action space. Online Riemannian gradient descent is performed with the minimum curvature κ(ξ_t) of −ln p̂(·∣ξ_t) replacing the unknown Lipschitz constant; the central claim is a contextual safety bound in which distance to the true feasibility manifold is controlled by score-estimation error and a κ-dependent ratio, both of which improve with richer context. Simulations on a dynamic navigation task show that contextual PPC outperforms marginal and frozen density baselines, with the gap widening after environment shifts.

Significance. If the safety bound is rigorously derived and the Riemannian structure is validly induced from binary feasibility labels, the work would offer a geometry-aware alternative to Lipschitz-based analyses for safe control in black-box settings. The use of context-dependent curvature to govern both convergence rate and safety margin, together with online adaptation, is a potentially useful contribution to systems and control, especially if the manuscript supplies machine-checked proofs or reproducible code for the bound.

major comments (2)

[Main theoretical result] Main theoretical result (as stated in the abstract and the derivation of the contextual safety bound): the claim that distance to the true feasibility manifold is bounded by score-estimation error and a ratio depending on κ(ξ_t) requires that the estimated conditional density endows the action space with a valid Riemannian metric whose minimum curvature lower-bounds the true distance everywhere the optimizer operates. Because the simulator supplies only binary feasibility labels, any score estimator must extrapolate the log-density gradient and Hessian from samples; errors are largest near the manifold boundary where density is low. The manuscript must show explicitly (via the definition of the ratio and the region where the curvature bound holds) that the claimed upper bound remains valid under such extrapolation error, or provide a counter-example demonstrating when it fails.
[PPC framework and Riemannian geometry] § on the PPC framework and Riemannian geometry: the construction assumes that finite-sample score estimation from black-box feasibility data yields an accurate Riemannian metric whose min curvature κ(ξ_t) simultaneously controls convergence rate and safety margin. The paper must clarify whether κ(ξ_t) is computed from the fitted density or from an independent geometric quantity, and must address whether online Riemannian steps remain inside the region where the curvature lower bound is guaranteed (especially after environment shifts).

minor comments (2)

[Notation] Notation for the estimated density p̂(u∣ξ_t) and the barrier curvature κ(ξ_t) should be introduced with an explicit equation early in the manuscript so that the ratio appearing in the safety bound can be traced directly to its definition.
[Simulations] The simulation section would benefit from a table reporting the precise context richness levels, number of feasibility samples per context, and quantitative safety-margin values (not only qualitative outperformance) to allow readers to assess how the advantage grows after shifts.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed report. The comments highlight important aspects of the theoretical guarantees and practical implementation of the PPC framework. Below we respond point-by-point to the major comments, indicating where revisions will be made to improve clarity and rigor.

read point-by-point responses

Referee: [Main theoretical result] Main theoretical result (as stated in the abstract and the derivation of the contextual safety bound): the claim that distance to the true feasibility manifold is bounded by score-estimation error and a ratio depending on κ(ξ_t) requires that the estimated conditional density endows the action space with a valid Riemannian metric whose minimum curvature lower-bounds the true distance everywhere the optimizer operates. Because the simulator supplies only binary feasibility labels, any score estimator must extrapolate the log-density gradient and Hessian from samples; errors are largest near the manifold boundary where density is low. The manuscript must show explicitly (via the definition of the ratio and the region where the curvature bound holds) that the claimed upper bound remains valid under such extrapolation error, or provide a counter-example.

Authors: The contextual safety bound (Theorem 3.1) is stated directly in terms of the score estimation error ε(ξ_t), which is defined to capture all discrepancies between the estimated and true conditional densities, including extrapolation effects near the feasibility boundary. The Riemannian metric is induced by the estimated density p̂ via its Hessian, and κ(ξ_t) is the infimum of the curvature of −ln p̂ over the sublevel set in which the optimizer is proven to remain (by the barrier penalty). Because the bound is expressed as a function of ε(ξ_t) and the κ-dependent ratio, it holds by construction whenever the estimation error is finite; the extrapolation error is already folded into ε(ξ_t). We will add a clarifying paragraph in Section 3.3 that explicitly defines the operating region and shows that the ratio remains well-defined under the stated assumptions on ε. revision: partial
Referee: [PPC framework and Riemannian geometry] § on the PPC framework and Riemannian geometry: the construction assumes that finite-sample score estimation from black-box feasibility data yields an accurate Riemannian metric whose min curvature κ(ξ_t) simultaneously controls convergence rate and safety margin. The paper must clarify whether κ(ξ_t) is computed from the fitted density or from an independent geometric quantity, and must address whether online Riemannian steps remain inside the region where the curvature lower bound is guaranteed (especially after environment shifts).

Authors: κ(ξ_t) is computed exclusively from the fitted conditional density estimate p̂(u|ξ_t) as the smallest eigenvalue of the Hessian of −ln p̂ (Eq. 8). It is not an independent geometric quantity. The convergence analysis (Theorem 4.1) shows that the penalized Riemannian gradient steps remain inside the sublevel set where the curvature lower bound is valid; the barrier term prevents escape even when the density estimate is updated online. After environment shifts the context ξ_t triggers a fresh density estimate, and the safety bound adapts with the new κ(ξ_t) and ε(ξ_t). We will insert a short remark in Section 4.2 reiterating the source of κ and noting that the same barrier argument applies post-shift. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation uses independent geometric bounds on estimation error

full rationale

The paper constructs a Riemannian metric from the fitted score-based density estimate and derives a contextual safety bound relating manifold distance to score error and the curvature κ(ξ_t) of that same estimate. No quoted equations or self-citations reduce the bound to a tautology or fitted input by construction; the result follows from standard online Riemannian optimization analysis applied to the PPC setup, with the curvature term serving as a derived quantity rather than a redefinition of the safety margin itself. The framework remains self-contained against external optimization geometry without load-bearing self-citation chains.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that a black-box simulator's feasibility samples can be turned into a faithful score-based density that defines a useful Riemannian metric, plus the mathematical premise that minimum curvature of the log-density controls both optimization speed and safety margin.

free parameters (1)

barrier curvature κ(ξ_t)
Defined as the minimum curvature of the conditional log-density; appears as the key quantity scaling the safety bound and convergence rate.

axioms (1)

domain assumption Feasibility samples from the Simulator can be compressed into an accurate score-based density p̂(u∣ξt) that endows the action space with Riemannian geometry.
Invoked to replace explicit dynamics with geometry derived from the world model.

invented entities (1)

Penalized Predictive Control (PPC) framework no independent evidence
purpose: To perform contextual safety-critical optimization by guiding Riemannian gradient descent with simulator-derived densities.
New named framework introduced to combine predictive control, penalties, and online Riemannian methods.

pith-pipeline@v0.9.0 · 5492 in / 1503 out tokens · 74901 ms · 2026-05-10T01:50:15.988595+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

[1]

Learning to simulate complex physics with graph networks,

A. Sanchez-Gonzalez, J. Godwin, T. Pfaff, R. Ying, J. Leskovec, and P. Battaglia, “Learning to simulate complex physics with graph networks,” inInternational Conference on Machine Learning, pp. 8459–8468, PMLR, 2020

work page 2020
[2]

Video prediction models as rewards for reinforcement learning,

A. Escontrela, A. Adeniji, W. Tong, B. Mazoure, and P. Abbeel, “Video prediction models as rewards for reinforcement learning,” inAdvances in Neural Information Processing Systems, vol. 36, 2024

work page 2024
[3]

Recurrent world models facilitate policy evolution,

D. Ha and J. Schmidhuber, “Recurrent world models facilitate policy evolution,” inAdvances in Neural Information Processing Systems, vol. 31, 2018

work page 2018
[4]

Genie: Generative interactive environments,

J. Bruce, M. Dennis, A. Edwards, J. Parker-Holder, Y . Shi, E. Hughes, M. Lai, A. Mavalankar, R. Steiber, C. Rae,et al., “Genie: Generative interactive environments,” inProceedings of the 41st International Conference on Machine Learning, pp. 4583–4612, PMLR, 2024

work page 2024
[5]

Ctrl-world: A controllable gener- ative world model for robot manipulation,

Y . Guo, L. Shi, J. Chen, and C. Finn, “Ctrl-world: A controllable gener- ative world model for robot manipulation,” inInternational Conference on Learning Representations (ICLR), 2026

work page 2026
[6]

Information aggrega- tion for constrained online control,

T. Li, Y . Chen, B. Sun, A. Wierman, and S. Low, “Information aggrega- tion for constrained online control,”ACM SIGMETRICS Performance Evaluation Review, vol. 49, no. 1, pp. 7–8, 2021

work page 2021
[7]

Learning- based predictive control via real-time aggregate flexibility,

T. Li, B. Sun, Y . Chen, Z. Ye, S. H. Low, and A. Wierman, “Learning- based predictive control via real-time aggregate flexibility,”IEEE Transactions on Smart Grid, vol. 12, no. 6, pp. 4897–4913, 2021

work page 2021
[8]

Precog: Prediction conditioned on goals in visual multi-agent settings,

N. Rhinehart, R. McAllister, K. Kitani, and S. Levine, “Precog: Prediction conditioned on goals in visual multi-agent settings,” inProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2821–2830, 2019

work page 2019
[9]

Linearly-solvable Markov decision problems,

E. Todorov, “Linearly-solvable Markov decision problems,” inAdvances in Neural Information Processing Systems, vol. 19, 2006

work page 2006
[10]

Linear theory for control of nonlinear stochastic systems,

H. J. Kappen, “Linear theory for control of nonlinear stochastic systems,” Physical Review Letters, vol. 95, no. 20, p. 200201, 2005

work page 2005
[11]

Land- ing with the score: Riemannian optimization through denoising,

A. Kharitenko, Z. Shen, R. De Santi, N. He, and F. Doerfler, “Land- ing with the score: Riemannian optimization through denoising,” in International Conference on Learning Representations (ICLR), 2025

work page 2025
[12]

Model predictive control: Theory and practice—a survey,

C. E. Garcia, D. M. Prett, and M. Morari, “Model predictive control: Theory and practice—a survey,”Automatica, vol. 25, no. 3, pp. 335–348, 1989

work page 1989
[13]

Robust model predictive control: A survey,

A. Bemporad and M. Morari, “Robust model predictive control: A survey,” inRobustness in identification and control, pp. 207–226, Springer, 1999

work page 1999
[14]

Robust model predictive 20 control of constrained linear systems with bounded disturbances,

D. Q. Mayne, M. M. Seron, and S. Rakovi ´c, “Robust model predictive 20 control of constrained linear systems with bounded disturbances,” Automatica, vol. 41, no. 2, pp. 219–224, 2005

work page 2005
[15]

Data-driven model predictive control with stability and robustness guarantees,

J. Berberich, J. Köhler, M. A. Müller, and F. Allgöwer, “Data-driven model predictive control with stability and robustness guarantees,”IEEE Transactions on Automatic Control, vol. 66, no. 4, pp. 1702–1717, 2020

work page 2020
[16]

Data-driven mpc with stability guarantees using extended dynamic mode decomposition,

L. Bold, L. Grüne, M. Schaller, and K. Worthmann, “Data-driven mpc with stability guarantees using extended dynamic mode decomposition,” IEEE Transactions on Automatic Control, 2024

work page 2024
[17]

Data-driven predictive control for autonomous systems,

U. Rosolia, X. Zhang, and F. Borrelli, “Data-driven predictive control for autonomous systems,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 1, no. 1, pp. 259–286, 2018

work page 2018
[18]

A predictive safety filter for learning-based control of constrained nonlinear dynamical systems,

K. P. Wabersich and M. N. Zeilinger, “A predictive safety filter for learning-based control of constrained nonlinear dynamical systems,” Automatica, vol. 129, p. 109597, 2021

work page 2021
[19]

Robust adaptive NMPC using ellipsoidal tubes,

J. Buerger and M. Cannon, “Robust adaptive NMPC using ellipsoidal tubes,”Automatica, 2026. Submitted

work page 2026
[20]

The cross-entropy method for combinatorial and continuous optimization,

R. Y . Rubinstein, “The cross-entropy method for combinatorial and continuous optimization,”Methodology and Computing in Applied Probability, vol. 1, no. 2, pp. 127–190, 1999

work page 1999
[21]

Deep reinforcement learning in a handful of trials using probabilistic dynamics models,

K. Chua, R. Calandra, R. McAllister, and S. Levine, “Deep reinforcement learning in a handful of trials using probabilistic dynamics models,” in Advances in Neural Information Processing Systems, vol. 31, 2018

work page 2018
[22]

Sample-efficient cross-entropy method for real-time planning,

C. Pinneri, S. Sawant, S. Blaes, J. Achterhold, J. Stueckler, M. Rolinek, and G. Martius, “Sample-efficient cross-entropy method for real-time planning,” inConference on Robot Learning, pp. 1049–1065, PMLR, 2021

work page 2021
[23]

The information bottleneck method,

N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneck method,”Proceedings of the 37-th Annual Allerton Conference on Communication, 2000

work page 2000
[24]

Estimation of non-normalized statistical models by score matching,

A. Hyvärinen, “Estimation of non-normalized statistical models by score matching,”Journal of Machine Learning Research, vol. 6, pp. 695–709, 2005

work page 2005
[25]

A connection between score matching and denoising autoencoders,

P. Vincent, “A connection between score matching and denoising autoencoders,”Neural Computation, vol. 23, no. 7, pp. 1661–1674, 2011

work page 2011
[26]

Boumal,An Introduction to Optimization on Smooth Manifolds

N. Boumal,An Introduction to Optimization on Smooth Manifolds. Cambridge University Press, 2023

work page 2023
[27]

Absil, R

P.-A. Absil, R. Mahony, and R. Sepulchre,Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2008

work page 2008
[28]

Variational inference with normalizing flows,

D. Rezende and S. Mohamed, “Variational inference with normalizing flows,” inInternational Conference on Machine Learning, pp. 1530–1538, PMLR, 2015

work page 2015
[29]

Normalizing flows: An introduction and review of current methods,

I. Kobyzev, S. J. Prince, and M. A. Brubaker, “Normalizing flows: An introduction and review of current methods,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 11, pp. 3964– 3979, 2020

work page 2020
[30]

Maximum entropy reinforcement learning via energy-based normalizing flow,

C.-H. Chao, C. Feng, W.-F. Sun, C.-K. Lee, S. See, and C.-Y . Lee, “Maximum entropy reinforcement learning via energy-based normalizing flow,”Advances in Neural Information Processing Systems, vol. 37, pp. 56136–56165, 2024

work page 2024
[31]

Learning for safety- critical control with control barrier functions,

A. Taylor, A. Singletary, Y . Yue, and A. Ames, “Learning for safety- critical control with control barrier functions,” inLearning for dynamics and control, pp. 708–717, PMLR, 2020

work page 2020
[32]

Control barrier functions: Theory and applications,

A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada, “Control barrier functions: Theory and applications,” in2019 18th European control conference (ECC), pp. 3420–3431, IEEE, 2019

work page 2019
[33]

Control barrier function based quadratic programs for safety critical systems,

A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs for safety critical systems,”IEEE Transactions on Automatic Control, vol. 62, no. 8, pp. 3861–3876, 2016

work page 2016
[34]

Robust control barrier functions for uncertain parameter-varying control affine systems with set-membership parameter estimation,

T. Pati and S. Z. Yong, “Robust control barrier functions for uncertain parameter-varying control affine systems with set-membership parameter estimation,”IEEE Transactions on Automatic Control, 2025

work page 2025
[35]

From learning to safety: A direct data-driven framework for constrained control,

K. He, S. Shi, T. van den Boom, and B. De Schutter, “From learning to safety: A direct data-driven framework for constrained control,”IEEE Transactions on Automatic Control, 2026. Early access

work page 2026
[36]

Data-driven input-output control barrier functions,

M. Bajelani and K. van Heusden, “Data-driven input-output control barrier functions,”IEEE Transactions on Automatic Control, 2026. Early access

work page 2026
[37]

End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks,

R. Cheng, G. Orosz, R. M. Murray, and J. W. Burdick, “End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks,” inProceedings of the AAAI conference on artificial intelligence, vol. 33, pp. 3387–3395, 2019

work page 2019
[38]

A. B. Tsybakov,Introduction to Nonparametric Estimation. Springer Series in Statistics, Springer, 2009

work page 2009

[1] [1]

Learning to simulate complex physics with graph networks,

A. Sanchez-Gonzalez, J. Godwin, T. Pfaff, R. Ying, J. Leskovec, and P. Battaglia, “Learning to simulate complex physics with graph networks,” inInternational Conference on Machine Learning, pp. 8459–8468, PMLR, 2020

work page 2020

[2] [2]

Video prediction models as rewards for reinforcement learning,

A. Escontrela, A. Adeniji, W. Tong, B. Mazoure, and P. Abbeel, “Video prediction models as rewards for reinforcement learning,” inAdvances in Neural Information Processing Systems, vol. 36, 2024

work page 2024

[3] [3]

Recurrent world models facilitate policy evolution,

D. Ha and J. Schmidhuber, “Recurrent world models facilitate policy evolution,” inAdvances in Neural Information Processing Systems, vol. 31, 2018

work page 2018

[4] [4]

Genie: Generative interactive environments,

J. Bruce, M. Dennis, A. Edwards, J. Parker-Holder, Y . Shi, E. Hughes, M. Lai, A. Mavalankar, R. Steiber, C. Rae,et al., “Genie: Generative interactive environments,” inProceedings of the 41st International Conference on Machine Learning, pp. 4583–4612, PMLR, 2024

work page 2024

[5] [5]

Ctrl-world: A controllable gener- ative world model for robot manipulation,

Y . Guo, L. Shi, J. Chen, and C. Finn, “Ctrl-world: A controllable gener- ative world model for robot manipulation,” inInternational Conference on Learning Representations (ICLR), 2026

work page 2026

[6] [6]

Information aggrega- tion for constrained online control,

T. Li, Y . Chen, B. Sun, A. Wierman, and S. Low, “Information aggrega- tion for constrained online control,”ACM SIGMETRICS Performance Evaluation Review, vol. 49, no. 1, pp. 7–8, 2021

work page 2021

[7] [7]

Learning- based predictive control via real-time aggregate flexibility,

T. Li, B. Sun, Y . Chen, Z. Ye, S. H. Low, and A. Wierman, “Learning- based predictive control via real-time aggregate flexibility,”IEEE Transactions on Smart Grid, vol. 12, no. 6, pp. 4897–4913, 2021

work page 2021

[8] [8]

Precog: Prediction conditioned on goals in visual multi-agent settings,

N. Rhinehart, R. McAllister, K. Kitani, and S. Levine, “Precog: Prediction conditioned on goals in visual multi-agent settings,” inProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2821–2830, 2019

work page 2019

[9] [9]

Linearly-solvable Markov decision problems,

E. Todorov, “Linearly-solvable Markov decision problems,” inAdvances in Neural Information Processing Systems, vol. 19, 2006

work page 2006

[10] [10]

Linear theory for control of nonlinear stochastic systems,

H. J. Kappen, “Linear theory for control of nonlinear stochastic systems,” Physical Review Letters, vol. 95, no. 20, p. 200201, 2005

work page 2005

[11] [11]

Land- ing with the score: Riemannian optimization through denoising,

A. Kharitenko, Z. Shen, R. De Santi, N. He, and F. Doerfler, “Land- ing with the score: Riemannian optimization through denoising,” in International Conference on Learning Representations (ICLR), 2025

work page 2025

[12] [12]

Model predictive control: Theory and practice—a survey,

C. E. Garcia, D. M. Prett, and M. Morari, “Model predictive control: Theory and practice—a survey,”Automatica, vol. 25, no. 3, pp. 335–348, 1989

work page 1989

[13] [13]

Robust model predictive control: A survey,

A. Bemporad and M. Morari, “Robust model predictive control: A survey,” inRobustness in identification and control, pp. 207–226, Springer, 1999

work page 1999

[14] [14]

Robust model predictive 20 control of constrained linear systems with bounded disturbances,

D. Q. Mayne, M. M. Seron, and S. Rakovi ´c, “Robust model predictive 20 control of constrained linear systems with bounded disturbances,” Automatica, vol. 41, no. 2, pp. 219–224, 2005

work page 2005

[15] [15]

Data-driven model predictive control with stability and robustness guarantees,

J. Berberich, J. Köhler, M. A. Müller, and F. Allgöwer, “Data-driven model predictive control with stability and robustness guarantees,”IEEE Transactions on Automatic Control, vol. 66, no. 4, pp. 1702–1717, 2020

work page 2020

[16] [16]

Data-driven mpc with stability guarantees using extended dynamic mode decomposition,

L. Bold, L. Grüne, M. Schaller, and K. Worthmann, “Data-driven mpc with stability guarantees using extended dynamic mode decomposition,” IEEE Transactions on Automatic Control, 2024

work page 2024

[17] [17]

Data-driven predictive control for autonomous systems,

U. Rosolia, X. Zhang, and F. Borrelli, “Data-driven predictive control for autonomous systems,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 1, no. 1, pp. 259–286, 2018

work page 2018

[18] [18]

A predictive safety filter for learning-based control of constrained nonlinear dynamical systems,

K. P. Wabersich and M. N. Zeilinger, “A predictive safety filter for learning-based control of constrained nonlinear dynamical systems,” Automatica, vol. 129, p. 109597, 2021

work page 2021

[19] [19]

Robust adaptive NMPC using ellipsoidal tubes,

J. Buerger and M. Cannon, “Robust adaptive NMPC using ellipsoidal tubes,”Automatica, 2026. Submitted

work page 2026

[20] [20]

The cross-entropy method for combinatorial and continuous optimization,

R. Y . Rubinstein, “The cross-entropy method for combinatorial and continuous optimization,”Methodology and Computing in Applied Probability, vol. 1, no. 2, pp. 127–190, 1999

work page 1999

[21] [21]

Deep reinforcement learning in a handful of trials using probabilistic dynamics models,

K. Chua, R. Calandra, R. McAllister, and S. Levine, “Deep reinforcement learning in a handful of trials using probabilistic dynamics models,” in Advances in Neural Information Processing Systems, vol. 31, 2018

work page 2018

[22] [22]

Sample-efficient cross-entropy method for real-time planning,

C. Pinneri, S. Sawant, S. Blaes, J. Achterhold, J. Stueckler, M. Rolinek, and G. Martius, “Sample-efficient cross-entropy method for real-time planning,” inConference on Robot Learning, pp. 1049–1065, PMLR, 2021

work page 2021

[23] [23]

The information bottleneck method,

N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneck method,”Proceedings of the 37-th Annual Allerton Conference on Communication, 2000

work page 2000

[24] [24]

Estimation of non-normalized statistical models by score matching,

A. Hyvärinen, “Estimation of non-normalized statistical models by score matching,”Journal of Machine Learning Research, vol. 6, pp. 695–709, 2005

work page 2005

[25] [25]

A connection between score matching and denoising autoencoders,

P. Vincent, “A connection between score matching and denoising autoencoders,”Neural Computation, vol. 23, no. 7, pp. 1661–1674, 2011

work page 2011

[26] [26]

Boumal,An Introduction to Optimization on Smooth Manifolds

N. Boumal,An Introduction to Optimization on Smooth Manifolds. Cambridge University Press, 2023

work page 2023

[27] [27]

Absil, R

P.-A. Absil, R. Mahony, and R. Sepulchre,Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2008

work page 2008

[28] [28]

Variational inference with normalizing flows,

D. Rezende and S. Mohamed, “Variational inference with normalizing flows,” inInternational Conference on Machine Learning, pp. 1530–1538, PMLR, 2015

work page 2015

[29] [29]

Normalizing flows: An introduction and review of current methods,

I. Kobyzev, S. J. Prince, and M. A. Brubaker, “Normalizing flows: An introduction and review of current methods,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 11, pp. 3964– 3979, 2020

work page 2020

[30] [30]

Maximum entropy reinforcement learning via energy-based normalizing flow,

C.-H. Chao, C. Feng, W.-F. Sun, C.-K. Lee, S. See, and C.-Y . Lee, “Maximum entropy reinforcement learning via energy-based normalizing flow,”Advances in Neural Information Processing Systems, vol. 37, pp. 56136–56165, 2024

work page 2024

[31] [31]

Learning for safety- critical control with control barrier functions,

A. Taylor, A. Singletary, Y . Yue, and A. Ames, “Learning for safety- critical control with control barrier functions,” inLearning for dynamics and control, pp. 708–717, PMLR, 2020

work page 2020

[32] [32]

Control barrier functions: Theory and applications,

A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada, “Control barrier functions: Theory and applications,” in2019 18th European control conference (ECC), pp. 3420–3431, IEEE, 2019

work page 2019

[33] [33]

Control barrier function based quadratic programs for safety critical systems,

A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs for safety critical systems,”IEEE Transactions on Automatic Control, vol. 62, no. 8, pp. 3861–3876, 2016

work page 2016

[34] [34]

Robust control barrier functions for uncertain parameter-varying control affine systems with set-membership parameter estimation,

T. Pati and S. Z. Yong, “Robust control barrier functions for uncertain parameter-varying control affine systems with set-membership parameter estimation,”IEEE Transactions on Automatic Control, 2025

work page 2025

[35] [35]

From learning to safety: A direct data-driven framework for constrained control,

K. He, S. Shi, T. van den Boom, and B. De Schutter, “From learning to safety: A direct data-driven framework for constrained control,”IEEE Transactions on Automatic Control, 2026. Early access

work page 2026

[36] [36]

Data-driven input-output control barrier functions,

M. Bajelani and K. van Heusden, “Data-driven input-output control barrier functions,”IEEE Transactions on Automatic Control, 2026. Early access

work page 2026

[37] [37]

End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks,

R. Cheng, G. Orosz, R. M. Murray, and J. W. Burdick, “End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks,” inProceedings of the AAAI conference on artificial intelligence, vol. 33, pp. 3387–3395, 2019

work page 2019

[38] [38]

A. B. Tsybakov,Introduction to Nonparametric Estimation. Springer Series in Statistics, Springer, 2009

work page 2009