pith. sign in

arxiv: 2606.00089 · v1 · pith:TYCITYESnew · submitted 2026-05-23 · 💻 cs.RO · cs.AI

Can Predicted Dynamics Exist in the Physical World?

Pith reviewed 2026-06-30 12:59 UTC · model grok-4.3

classification 💻 cs.RO cs.AI
keywords physical admissibilitypredictive dynamicsrobot controlexecutabilitykinematic conditionsdynamic residualsLeRobot PushTprediction error
0
0 comments X

The pith

Decoded proposals from predictive models can be checked for physical executability by treating them as candidate dynamics and testing kinematic, dynamic, and horizon conditions before execution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that low root-mean-square error on predicted states or actions does not guarantee a proposal can be executed in the physical world. It formulates physical admissibility as an interface that evaluates a decoded proposal against kinematic, dynamic, and direct-to-composed horizon conditions, rejecting those that violate the physical envelope and supplying a component-level reason for rejection. A sympathetic reader would care because robotic systems that act on invalid dynamics risk task failure or hardware damage, while a usable filter could remove most bad proposals without harming overall progress. Experiments on the LeRobot PushT task show the full gate reaching an AUC of 0.957 for detecting invalid proposals, with residual-based filters and the gate together blocking 87 to 89 percent of them while keeping mean progress near 0.998.

Core claim

Physical admissibility is formulated as a prediction-control interface. Before execution, a decoded proposal is treated as candidate dynamics and evaluated using kinematic, dynamic, and direct-to-composed horizon conditions. Passing these does not certify task success, but rejection identifies violation of the specified physical envelope and supplies a component-level reason. Controlled falsification on Hugging Face LeRobot PushT shows one-step prediction-RMSE reaching AUC 0.982, standardized dynamics residuals AUC 0.972, kinematic-only conditions AUC 0.592, and the full gate AUC 0.957 with attribution. Replay-based intervention experiments demonstrate that residual-based filters and the ful

What carries the argument

The physical-admissibility gate, which treats a decoded proposal as candidate dynamics and applies kinematic, dynamic, and direct-to-composed horizon conditions to test physical executability.

If this is right

  • Prediction error alone does not ensure a proposal is physically executable.
  • The gate supplies condition-level attribution to explain why a proposal is rejected.
  • Residual-based filters combined with the gate block most invalid proposals.
  • Task progress remains nearly unchanged after the filtering step.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The gate could be inserted upstream of execution in other predictive robotic controllers to reduce unsafe actions.
  • The same condition structure might apply to different robot platforms if the kinematic and dynamic thresholds are retuned.
  • Longer-horizon proposals could be checked by extending the direct-to-composed horizon test rather than relying solely on one-step residuals.

Load-bearing premise

The kinematic, dynamic, and direct-to-composed horizon conditions together form a sufficient test for physical executability of a decoded proposal.

What would settle it

An experiment in which a proposal passes all three condition sets yet cannot be executed physically, or fails the conditions yet executes successfully without violation.

Figures

Figures reproduced from arXiv: 2606.00089 by Barak Or.

Figure 1
Figure 1. Figure 1: Central interface: predictive Physical AI outputs are evaluated as candidate dynamics [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Dynamic-violation results on LeRobot PushT: detector-level AUC/AP and nominal [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Condition-level results: monitor ablation and direct-to-composed flow agreement. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Replay intervention outcomes. Each runtime gate either accepts a proposed action chunk [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Training losses for the compact PushT world-model baselines used in the monitor evalua [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Illustrative LeRobot PushT episode. The image rollout shows the manipulation sequence, [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Sensitivity of the runtime gate to envelope quantile, envelope margin, and decision thresh [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Additional replay progress and decision-utility outcomes complementing Figure [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Additional runtime-gate analyses. The left panel reports sensitivity to empirical-envelope [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
read the original abstract

Predictive Physical AI systems output state rollouts, action chunks, and latent plans, yet a low root-mean-square error (RMSE) does not imply that a particular proposal is physically executable. We formulate physical admissibility as a prediction-control interface: before execution, a decoded proposal is treated as candidate dynamics and evaluated using kinematic, dynamic, and direct-to-composed horizon conditions. Passing is not a certificate of task success; rejection identifies violation of the specified physical envelope and gives a component-level reason. On Hugging Face LeRobot PushT, controlled falsification shows that one-step prediction-RMSE and standardized dynamics residuals reach area under the receiver operating characteristic curve (AUC) 0.982 and 0.972, kinematic-only conditions reach AUC 0.592, and the full gate reaches AUC 0.957 with condition-level attribution. In replay-based intervention experiments, residual-based filters and the full physical-admissibility gate prevent 87-$89% of invalid proposals while preserving mean progress near 0.998.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript claims that physical admissibility of predicted dynamics can be formulated as a prediction-control interface using kinematic, dynamic, and direct-to-composed horizon conditions. On the LeRobot PushT dataset, controlled falsification experiments show AUCs of 0.982 for one-step prediction-RMSE, 0.972 for dynamics residuals, 0.592 for kinematic conditions, and 0.957 for the full gate, with the gate preventing 87-89% of invalid proposals while preserving progress near 0.998.

Significance. If the proposed conditions are demonstrated to be sufficient and non-redundant for physical executability, the work offers a useful pre-execution filter for robotic prediction systems with interpretable rejection reasons. Strengths include the use of a public dataset, condition-level attribution in results, and quantitative reporting of AUC and prevention rates.

major comments (1)
  1. [abstract (paragraph on formulation of physical admissibility)] The central claim relies on the assumption that the kinematic, dynamic, and direct-to-composed horizon conditions are jointly sufficient to identify all physical violations (i.e., no physically invalid proposal passes all three). However, no derivation, proof, or argument is provided to establish this sufficiency or non-redundancy; the evaluation is limited to empirical AUC on one dataset split without addressing potential unmodeled effects.
minor comments (2)
  1. [abstract] The description of how the horizon conditions are computed, the exact setup of controlled falsification, and any error bars or statistical details on the AUC values are absent, which affects reproducibility and assessment of the reported numbers.
  2. [abstract] The prevention rate is written as '87-$89%'; this appears to be a LaTeX artifact and should be corrected to '87-89%'.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and constructive comments. We address the major comment point by point below, clarifying the scope and intent of the formulation in the manuscript.

read point-by-point responses
  1. Referee: [abstract (paragraph on formulation of physical admissibility)] The central claim relies on the assumption that the kinematic, dynamic, and direct-to-composed horizon conditions are jointly sufficient to identify all physical violations (i.e., no physically invalid proposal passes all three). However, no derivation, proof, or argument is provided to establish this sufficiency or non-redundancy; the evaluation is limited to empirical AUC on one dataset split without addressing potential unmodeled effects.

    Authors: We appreciate this observation but note that the manuscript does not advance a claim of joint sufficiency for identifying all possible physical violations. Physical admissibility is explicitly formulated as a prediction-control interface that rejects proposals violating the specified kinematic, dynamic, and direct-to-composed horizon conditions, with the abstract stating that 'rejection identifies violation of the specified physical envelope and gives a component-level reason.' These conditions define an operational filter rather than a complete physical model; no derivation or proof of completeness is provided because the work presents a practical, interpretable interface rather than a theorem establishing exhaustiveness. The evaluation uses controlled falsification on the public LeRobot PushT dataset to quantify detection performance (AUC 0.957 for the full gate) and intervention effectiveness (87-89% prevention of invalid proposals), with condition-level attribution. We acknowledge that the empirical scope is limited to one dataset split and does not address unmodeled effects outside the defined conditions, consistent with the paper's focus on this specific filter. revision: no

Circularity Check

0 steps flagged

No significant circularity; metrics derived from external dataset evaluation

full rationale

The paper formulates physical admissibility via three families of conditions (kinematic, dynamic, direct-to-composed horizon) and reports AUC values plus intervention prevention rates on the external Hugging Face LeRobot PushT dataset via controlled falsification. No equations, fitted parameters, or self-citations are shown that reduce the reported AUC 0.957, 87-89% prevention figures, or condition-level attributions to quantities defined or fitted inside the same paper. The evaluation remains statistically independent of the condition definitions themselves, satisfying the requirement for an externally falsifiable measurement rather than a self-referential construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unstated premise that the three named condition families are both necessary and jointly sufficient for physical admissibility; no free parameters or invented entities are mentioned in the abstract.

axioms (1)
  • domain assumption Kinematic, dynamic, and direct-to-composed horizon conditions together identify violations of the physical envelope
    Invoked in the formulation of physical admissibility as the basis for rejection giving component-level reason.

pith-pipeline@v0.9.1-grok · 5694 in / 1323 out tokens · 18711 ms · 2026-06-30T12:59:34.756708+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 44 canonical work pages · 23 internal anchors

  1. [1]

    World Models

    D. Ha and J. Schmidhuber. World models.arXiv preprint arXiv:1803.10122, 2018. URL https://arxiv.org/abs/1803.10122

  2. [2]

    K. Chua, R. Calandra, R. McAllister, and S. Levine. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. InAdvances in Neural Information Processing Systems, 2018. URLhttps://arxiv.org/abs/1805.12114

  3. [3]

    Janner, J

    M. Janner, J. Fu, M. Zhang, and S. Levine. When to trust your model: Model-based policy optimization. InAdvances in Neural Information Processing Systems, 2019. URLhttps: //arxiv.org/abs/1906.08253

  4. [4]

    Learning Latent Dynamics for Planning from Pixels

    D. Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, and J. Davidson. Learning latent dynamics for planning from pixels, 2019. URLhttps://arxiv.org/abs/1811.04551

  5. [5]

    Dream to Control: Learning Behaviors by Latent Imagination

    D. Hafner, T. Lillicrap, J. Ba, and M. Norouzi. Dream to control: Learning behaviors by latent imagination. InInternational Conference on Learning Representations, 2020. URL https://arxiv.org/abs/1912.01603

  6. [6]

    RT-1: Robotics Transformer for Real-World Control at Scale

    A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Haus- man, A. Herzog, J. Hsu, et al. RT-1: Robotics transformer for real-world control at scale, 2022. URLhttps://arxiv.org/abs/2212.06817

  7. [7]

    RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

    A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, X. Chen, K. Choromanski, T. Ding, D. Driess, A. Dubey, C. Finn, et al. RT-2: Vision-language-action models transfer web knowledge to robotic control, 2023. URLhttps://arxiv.org/abs/2307.15818

  8. [8]

    Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    Open X-Embodiment Collaboration. Open X-embodiment: Robotic learning datasets and RT- X models, 2023. URLhttps://arxiv.org/abs/2310.08864

  9. [9]

    Octo: An Open-Source Generalist Robot Policy

    Octo Model Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, T. Kreiman, C. Xu, et al. Octo: An open-source generalist robot policy, 2024. URLhttps: //arxiv.org/abs/2405.12213

  10. [10]

    M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, et al. OpenVLA: An open-source vision-language-action model, 2024. URLhttps://arxiv.org/abs/2406.09246

  11. [11]

    $\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

    K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichter, et al.π 0: A vision-language-action flow model for general robot control, 2024. URL https://arxiv.org/abs/2410.24164

  12. [12]

    SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

    M. Shukor, D. Aubakirova, F. Capuano, P. Kooijmans, S. Palma, A. Zouitine, M. Aractingi, C. Pascal, M. Russi, A. Marafioti, S. Alibert, M. Cord, T. Wolf, and R. Cadene. SmolVLA: A vision-language-action model for affordable and efficient robotics, 2025. URLhttps: //arxiv.org/abs/2506.01844

  13. [13]

    GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

    NVIDIA, J. Bjorck, F. Casta ˜neda, N. Cherniadev, X. Da, R. Ding, L. Fan, et al. GR00T N1: An open foundation model for generalist humanoid robots, 2025. URLhttps://arxiv.org/ abs/2503.14734. 9 Can Predicted Dynamics Exist in the Physical World?Dr. Barak Or

  14. [14]

    Cadene, S

    R. Cadene, S. Aliberts, F. Capuano, M. Aractingi, A. Zouitine, P. Kooijmans, J. Choghari, M. Russi, C. Pascal, S. Palma, M. Shukor, J. Moss, A. Soare, D. Aubakirova, Q. Lhoest, Q. Gallou´edec, and T. Wolf. LeRobot: An open-source library for end-to-end robot learning,

  15. [15]

    URLhttps://arxiv.org/abs/2602.22818

  16. [16]

    Kawaharazuka, J

    K. Kawaharazuka, J. Oh, J. Yamada, I. Posner, and Y . Zhu. Vision-language-action models for robotics: A review towards real-world applications, 2025. URLhttps://arxiv.org/abs/ 2510.07077

  17. [17]

    James, Z

    S. James, Z. Ma, D. R. Arrojo, and A. J. Davison. RLBench: The robot learning benchmark and learning environment.IEEE Robotics and Automation Letters, 5(2):3019–3026, 2020. doi:10.1109/LRA.2020.2974707. URLhttps://doi.org/10.1109/LRA.2020.2974707

  18. [18]

    T. Yu, D. Quillen, Z. He, R. Julian, A. Narayan, H. Shively, A. Bellathur, K. Hausman, C. Finn, and S. Levine. Meta-World: A benchmark and evaluation for multi-task and meta reinforce- ment learning. InProceedings of the Conference on Robot Learning, pages 1094–1100, 2020. URLhttps://proceedings.mlr.press/v100/yu20a.html

  19. [19]

    O. Mees, L. Hermann, E. Rosete-Beas, and W. Burgard. CALVIN: A benchmark for language- conditioned policy learning for long-horizon robot manipulation tasks, 2021. URLhttps: //arxiv.org/abs/2112.03227

  20. [20]

    B. Liu, Y . Zhu, C. Gao, Y . Feng, Q. Liu, Y . Zhu, and P. Stone. LIBERO: Benchmarking knowledge transfer for lifelong robot learning, 2023. URLhttps://arxiv.org/abs/2306. 03310

  21. [21]

    DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

    A. Khazatsky, K. Pertsch, S. Nair, A. Balakrishna, S. Dasari, S. Karamcheti, S. Nasiriany, et al. DROID: A large-scale in-the-wild robot manipulation dataset, 2024. URLhttps: //arxiv.org/abs/2403.12945

  22. [22]

    What Matters in Learning from Offline Human Demonstrations for Robot Manipulation

    A. Mandlekar, D. Xu, J. Wong, S. Nasiriany, C. Wang, R. Kulkarni, L. Fei-Fei, S. Savarese, Y . Zhu, and R. Mart´ın-Mart´ın. What matters in learning from offline human demonstrations for robot manipulation. InProceedings of the Conference on Robot Learning, 2021. URL https://arxiv.org/abs/2108.03298

  23. [23]

    N. M. M. Shafiullah, Z. J. Cui, A. Altanzaya, and L. Pinto. Behavior transformers: Cloning kmodes with one stone. InAdvances in Neural Information Processing Systems, 2022. URL https://arxiv.org/abs/2206.11251

  24. [24]

    C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song. Diffusion policy: Visuomotor policy learning via action diffusion, 2023. URLhttps://arxiv.org/ abs/2303.04137

  25. [25]

    T. Z. Zhao, V . Kumar, S. Levine, and C. Finn. Learning fine-grained bimanual manipulation with low-cost hardware, 2023. URLhttps://arxiv.org/abs/2304.13705

  26. [26]

    Mastering Diverse Domains through World Models

    D. Hafner, J. Pasukonis, J. Ba, and T. Lillicrap. Mastering diverse domains through world models.arXiv preprint arXiv:2301.04104, 2023. URLhttps://arxiv.org/abs/2301. 04104

  27. [27]

    Lillicrap, and David Silver

    J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lock- hart, D. Hassabis, T. Graepel, et al. Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020. doi:10.1038/s41586-020-03051-4. URL https://doi.org/10.1038/s41586-020-03051-4

  28. [28]

    doi: https://doi.org/10.1016/j.jcp.2018.10.045

    M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear par- tial differential equations.Journal of Computational Physics, 378:686–707, 2019. doi: 10.1016/j.jcp.2018.10.045. URLhttps://doi.org/10.1016/j.jcp.2018.10.045. 10 Can Predicted Dynami...

  29. [29]

    Available: https://arxiv.org/abs/1906.01563

    S. Greydanus, M. Dzamba, and J. Yosinski. Hamiltonian neural networks, 2019. URLhttps: //arxiv.org/abs/1906.01563

  30. [30]

    Cranmer, S

    M. Cranmer, S. Greydanus, S. Hoyer, P. Battaglia, D. Spergel, and S. Ho. Lagrangian neural networks, 2020. URLhttps://arxiv.org/abs/2003.04630

  31. [31]

    Sanchez-Gonzalez, J

    A. Sanchez-Gonzalez, J. Godwin, T. Pfaff, R. Ying, J. Leskovec, and P. W. Battaglia. Learning to simulate complex physics with graph networks. InProceedings of the 37th International Conference on Machine Learning, pages 8459–8468, 2020. URLhttps://proceedings. mlr.press/v119/sanchez-gonzalez20a.html

  32. [32]

    Qureshi, Anthony Simeonov, Mayur J

    C. Finn and S. Levine. Deep visual foresight for planning robot motion. In2017 IEEE Interna- tional Conference on Robotics and Automation, pages 2786–2793, 2017. doi:10.1109/ICRA. 2017.7989324. URLhttps://arxiv.org/abs/1610.00696

  33. [33]

    Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control

    F. Ebert, C. Finn, S. Dasari, A. Xie, A. Lee, and S. Levine. Visual foresight: Model-based deep reinforcement learning for vision-based robotic control. InProceedings of the Conference on Robot Learning, 2018. URLhttps://arxiv.org/abs/1812.00568

  34. [34]

    Garc ´ıa and F

    J. Garc ´ıa and F. Fern´andez. A comprehensive survey on safe reinforcement learning.Journal of Machine Learning Research, 16(1):1437–1480, 2015. URLhttps://jmlr.org/papers/ v16/garcia15a.html

  35. [35]

    Constrained Policy Optimization

    J. Achiam, D. Held, A. Tamar, and P. Abbeel. Constrained policy optimization. InProceedings of the 34th International Conference on Machine Learning, pages 22–31, 2017. URLhttps: //arxiv.org/abs/1705.10528

  36. [36]

    J. F. Fisac, A. K. Akametalu, M. N. Zeilinger, S. Kaynama, J. Gillula, and C. J. Tomlin. Bridg- ing hamilton-jacobi safety analysis and reinforcement learning. In2019 International Confer- ence on Robotics and Automation, pages 8550–8556, 2019. doi:10.1109/ICRA.2019.8794107. URLhttps://doi.org/10.1109/ICRA.2019.8794107

  37. [37]

    I. M. Mitchell, A. M. Bayen, and C. J. Tomlin. A time-dependent Hamilton-Jacobi formulation of reachable sets for continuous dynamic games.IEEE Transactions on Automatic Control, 50(7):947–957, 2005. doi:10.1109/TAC.2005.851439. URLhttps://doi.org/10.1109/ TAC.2005.851439

  38. [38]

    A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada. Control barrier functions: Theory and applications. In2019 18th European Control Conference, pages 3420–3431, 2019. doi:10.23919/ECC.2019.8796030. URLhttps://doi.org/10.23919/ ECC.2019.8796030

  39. [39]

    K. P. Wabersich and M. N. Zeilinger. A predictive safety filter for learning-based con- trol of constrained nonlinear dynamical systems.Automatica, 129:109597, 2021. doi:10. 1016/j.automatica.2021.109597. URLhttps://doi.org/10.1016/j.automatica.2021. 109597

  40. [40]

    K.-C. Hsu, H. Hu, and J. F. Fisac. The safety filter: A unified view of safety-critical control in autonomous systems.Annual Review of Control, Robotics, and Autonomous Systems, 7: 47–72, 2024. doi:10.1146/annurev-control-071723-102940. URLhttps://doi.org/10. 1146/annurev-control-071723-102940

  41. [41]

    D. Seto, B. H. Krogh, L. Sha, and A. Chutinan. The simplex architecture for safe online control system upgrades. InProceedings of the 1998 American Control Conference, pages 3504–3508, 1998. doi:10.1109/ACC.1998.703255. URLhttps://doi.org/10.1109/ACC. 1998.703255. 11 Can Predicted Dynamics Exist in the Physical World?Dr. Barak Or

  42. [42]

    Paul Blain Levy.Call-By-Push-Value: A Functional/Imperative Synthesis

    M. Leucker and C. Schallhart. A brief account of runtime verification.The Journal of Logic and Algebraic Programming, 78(5):293–303, 2009. doi:10.1016/j.jlap.2008.08.004. URL https://doi.org/10.1016/j.jlap.2008.08.004

  43. [43]

    Safe Reinforcement Learning via Shielding

    M. Alshiekh, R. Bloem, R. Ehlers, B. K ¨onighofer, S. Niekum, and U. Topcu. Safe reinforce- ment learning via shielding. InProceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018. URLhttps://arxiv.org/abs/1708.08611

  44. [44]

    K. L. Hobbs, M. L. Mote, M. Abate, S. Coogan, and E. Feron. Run time assurance for safety- critical systems: An introduction to safety filtering approaches for complex control systems. IEEE Control Systems Magazine, 43(2):28–65, 2023. doi:10.1109/MCS.2023.3234380. URL https://doi.org/10.1109/MCS.2023.3234380

  45. [45]

    C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger. On calibration of modern neural networks. Proceedings of the 34th International Conference on Machine Learning, pages 1321–1330,

  46. [46]

    URLhttps://arxiv.org/abs/1706.04599

  47. [47]

    Lakshminarayanan, A

    B. Lakshminarayanan, A. Pritzel, and C. Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles. InAdvances in Neural Information Processing Systems,

  48. [48]

    URLhttps://arxiv.org/abs/1612.01474

  49. [49]

    G. Katz, C. Barrett, D. L. Dill, K. Julian, and M. J. Kochenderfer. Reluplex: An efficient SMT solver for verifying deep neural networks. InInternational Conference on Computer Aided Verification, pages 97–117. Springer, 2017. doi:10.1007/978-3-319-63387-9 5. URL https://doi.org/10.1007/978-3-319-63387-9_5

  50. [50]

    T. Gehr, M. Mirman, D. Drachsler-Cohen, P. Tsankov, S. Chaudhuri, and M. Vechev. AI2: Safety and robustness certification of neural networks with abstract interpretation. In2018 IEEE Symposium on Security and Privacy, pages 3–18, 2018. doi:10.1109/SP.2018.00058. URLhttps://doi.org/10.1109/SP.2018.00058

  51. [51]

    Ivanov, J

    R. Ivanov, J. Weimer, R. Alur, G. J. Pappas, and I. Lee. Verisig: Verifying safety properties of hybrid systems with neural network controllers. InProceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control, pages 169–178, 2019. doi:10.1145/ 3302504.3311806. URLhttps://arxiv.org/abs/1811.01828. Appendix A Proof of Propo...