Can Predicted Dynamics Exist in the Physical World?

Barak Or

arxiv: 2606.00089 · v1 · pith:TYCITYESnew · submitted 2026-05-23 · 💻 cs.RO · cs.AI

Can Predicted Dynamics Exist in the Physical World?

Barak Or This is my paper

Pith reviewed 2026-06-30 12:59 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords physical admissibilitypredictive dynamicsrobot controlexecutabilitykinematic conditionsdynamic residualsLeRobot PushTprediction error

0 comments

The pith

Decoded proposals from predictive models can be checked for physical executability by treating them as candidate dynamics and testing kinematic, dynamic, and horizon conditions before execution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that low root-mean-square error on predicted states or actions does not guarantee a proposal can be executed in the physical world. It formulates physical admissibility as an interface that evaluates a decoded proposal against kinematic, dynamic, and direct-to-composed horizon conditions, rejecting those that violate the physical envelope and supplying a component-level reason for rejection. A sympathetic reader would care because robotic systems that act on invalid dynamics risk task failure or hardware damage, while a usable filter could remove most bad proposals without harming overall progress. Experiments on the LeRobot PushT task show the full gate reaching an AUC of 0.957 for detecting invalid proposals, with residual-based filters and the gate together blocking 87 to 89 percent of them while keeping mean progress near 0.998.

Core claim

Physical admissibility is formulated as a prediction-control interface. Before execution, a decoded proposal is treated as candidate dynamics and evaluated using kinematic, dynamic, and direct-to-composed horizon conditions. Passing these does not certify task success, but rejection identifies violation of the specified physical envelope and supplies a component-level reason. Controlled falsification on Hugging Face LeRobot PushT shows one-step prediction-RMSE reaching AUC 0.982, standardized dynamics residuals AUC 0.972, kinematic-only conditions AUC 0.592, and the full gate AUC 0.957 with attribution. Replay-based intervention experiments demonstrate that residual-based filters and the ful

What carries the argument

The physical-admissibility gate, which treats a decoded proposal as candidate dynamics and applies kinematic, dynamic, and direct-to-composed horizon conditions to test physical executability.

If this is right

Prediction error alone does not ensure a proposal is physically executable.
The gate supplies condition-level attribution to explain why a proposal is rejected.
Residual-based filters combined with the gate block most invalid proposals.
Task progress remains nearly unchanged after the filtering step.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The gate could be inserted upstream of execution in other predictive robotic controllers to reduce unsafe actions.
The same condition structure might apply to different robot platforms if the kinematic and dynamic thresholds are retuned.
Longer-horizon proposals could be checked by extending the direct-to-composed horizon test rather than relying solely on one-step residuals.

Load-bearing premise

The kinematic, dynamic, and direct-to-composed horizon conditions together form a sufficient test for physical executability of a decoded proposal.

What would settle it

An experiment in which a proposal passes all three condition sets yet cannot be executed physically, or fails the conditions yet executes successfully without violation.

Figures

Figures reproduced from arXiv: 2606.00089 by Barak Or.

**Figure 2.** Figure 2: Dynamic-violation results on LeRobot PushT: detector-level AUC/AP and nominal [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Condition-level results: monitor ablation and direct-to-composed flow agreement. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Replay intervention outcomes. Each runtime gate either accepts a proposed action chunk [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Training losses for the compact PushT world-model baselines used in the monitor evalua [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Illustrative LeRobot PushT episode. The image rollout shows the manipulation sequence, [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: Sensitivity of the runtime gate to envelope quantile, envelope margin, and decision thresh [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: Additional replay progress and decision-utility outcomes complementing Figure [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Additional runtime-gate analyses. The left panel reports sensitivity to empirical-envelope [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

read the original abstract

Predictive Physical AI systems output state rollouts, action chunks, and latent plans, yet a low root-mean-square error (RMSE) does not imply that a particular proposal is physically executable. We formulate physical admissibility as a prediction-control interface: before execution, a decoded proposal is treated as candidate dynamics and evaluated using kinematic, dynamic, and direct-to-composed horizon conditions. Passing is not a certificate of task success; rejection identifies violation of the specified physical envelope and gives a component-level reason. On Hugging Face LeRobot PushT, controlled falsification shows that one-step prediction-RMSE and standardized dynamics residuals reach area under the receiver operating characteristic curve (AUC) 0.982 and 0.972, kinematic-only conditions reach AUC 0.592, and the full gate reaches AUC 0.957 with condition-level attribution. In replay-based intervention experiments, residual-based filters and the full physical-admissibility gate prevent 87-$89% of invalid proposals while preserving mean progress near 0.998.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reports a workable empirical filter for flagging non-executable dynamics predictions on one robotics dataset, but the conditions are treated as sufficient without proof they catch all physical violations.

read the letter

The core contribution is an empirical check that combines kinematic bounds, dynamic residuals, and horizon consistency to gate predicted rollouts before execution. On the LeRobot PushT split it reaches AUC 0.957 overall and blocks 87-89% of invalid proposals while keeping task progress near 0.998. That is a concrete, usable number for anyone running learned controllers on physical hardware.

What the work does cleanly is separate the prediction step from the control interface and show component-level attribution. The one-step RMSE and residual filters already do most of the heavy lifting; the kinematic and horizon pieces add a smaller but measurable lift. Using a public dataset and reporting separate AUCs for each family of checks makes the result easy to inspect.

The soft spot is that the three condition families are presented as jointly covering physical executability, yet nothing derives or bounds why they would be complete. An unmodeled effect outside the tested horizon or contact model could still produce an invalid rollout that passes all three checks. The abstract gives no error bars, no sensitivity analysis on the thresholds, and no argument that the chosen residuals and bounds exhaust the relevant physics. The high AUC therefore reflects performance on the falsification set that was constructed, not a general guarantee.

This is aimed at people building predictive models for robot control who need a lightweight pre-execution filter. A reader already working on safety layers or residual-based monitoring will find the numbers directly applicable. The experiments are specific enough that a referee can evaluate the claims on their own terms, so the paper should go to review rather than desk rejection.

Referee Report

1 major / 2 minor

Summary. The manuscript claims that physical admissibility of predicted dynamics can be formulated as a prediction-control interface using kinematic, dynamic, and direct-to-composed horizon conditions. On the LeRobot PushT dataset, controlled falsification experiments show AUCs of 0.982 for one-step prediction-RMSE, 0.972 for dynamics residuals, 0.592 for kinematic conditions, and 0.957 for the full gate, with the gate preventing 87-89% of invalid proposals while preserving progress near 0.998.

Significance. If the proposed conditions are demonstrated to be sufficient and non-redundant for physical executability, the work offers a useful pre-execution filter for robotic prediction systems with interpretable rejection reasons. Strengths include the use of a public dataset, condition-level attribution in results, and quantitative reporting of AUC and prevention rates.

major comments (1)

[abstract (paragraph on formulation of physical admissibility)] The central claim relies on the assumption that the kinematic, dynamic, and direct-to-composed horizon conditions are jointly sufficient to identify all physical violations (i.e., no physically invalid proposal passes all three). However, no derivation, proof, or argument is provided to establish this sufficiency or non-redundancy; the evaluation is limited to empirical AUC on one dataset split without addressing potential unmodeled effects.

minor comments (2)

[abstract] The description of how the horizon conditions are computed, the exact setup of controlled falsification, and any error bars or statistical details on the AUC values are absent, which affects reproducibility and assessment of the reported numbers.
[abstract] The prevention rate is written as '87-$89%'; this appears to be a LaTeX artifact and should be corrected to '87-89%'.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and constructive comments. We address the major comment point by point below, clarifying the scope and intent of the formulation in the manuscript.

read point-by-point responses

Referee: [abstract (paragraph on formulation of physical admissibility)] The central claim relies on the assumption that the kinematic, dynamic, and direct-to-composed horizon conditions are jointly sufficient to identify all physical violations (i.e., no physically invalid proposal passes all three). However, no derivation, proof, or argument is provided to establish this sufficiency or non-redundancy; the evaluation is limited to empirical AUC on one dataset split without addressing potential unmodeled effects.

Authors: We appreciate this observation but note that the manuscript does not advance a claim of joint sufficiency for identifying all possible physical violations. Physical admissibility is explicitly formulated as a prediction-control interface that rejects proposals violating the specified kinematic, dynamic, and direct-to-composed horizon conditions, with the abstract stating that 'rejection identifies violation of the specified physical envelope and gives a component-level reason.' These conditions define an operational filter rather than a complete physical model; no derivation or proof of completeness is provided because the work presents a practical, interpretable interface rather than a theorem establishing exhaustiveness. The evaluation uses controlled falsification on the public LeRobot PushT dataset to quantify detection performance (AUC 0.957 for the full gate) and intervention effectiveness (87-89% prevention of invalid proposals), with condition-level attribution. We acknowledge that the empirical scope is limited to one dataset split and does not address unmodeled effects outside the defined conditions, consistent with the paper's focus on this specific filter. revision: no

Circularity Check

0 steps flagged

No significant circularity; metrics derived from external dataset evaluation

full rationale

The paper formulates physical admissibility via three families of conditions (kinematic, dynamic, direct-to-composed horizon) and reports AUC values plus intervention prevention rates on the external Hugging Face LeRobot PushT dataset via controlled falsification. No equations, fitted parameters, or self-citations are shown that reduce the reported AUC 0.957, 87-89% prevention figures, or condition-level attributions to quantities defined or fitted inside the same paper. The evaluation remains statistically independent of the condition definitions themselves, satisfying the requirement for an externally falsifiable measurement rather than a self-referential construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unstated premise that the three named condition families are both necessary and jointly sufficient for physical admissibility; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Kinematic, dynamic, and direct-to-composed horizon conditions together identify violations of the physical envelope
Invoked in the formulation of physical admissibility as the basis for rejection giving component-level reason.

pith-pipeline@v0.9.1-grok · 5694 in / 1323 out tokens · 18711 ms · 2026-06-30T12:59:34.756708+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 44 canonical work pages · 23 internal anchors

[1]

World Models

D. Ha and J. Schmidhuber. World models.arXiv preprint arXiv:1803.10122, 2018. URL https://arxiv.org/abs/1803.10122

work page internal anchor Pith review Pith/arXiv arXiv 2018
[2]

K. Chua, R. Calandra, R. McAllister, and S. Levine. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. InAdvances in Neural Information Processing Systems, 2018. URLhttps://arxiv.org/abs/1805.12114

work page internal anchor Pith review Pith/arXiv arXiv 2018
[3]

Janner, J

M. Janner, J. Fu, M. Zhang, and S. Levine. When to trust your model: Model-based policy optimization. InAdvances in Neural Information Processing Systems, 2019. URLhttps: //arxiv.org/abs/1906.08253

work page arXiv 2019
[4]

Learning Latent Dynamics for Planning from Pixels

D. Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, and J. Davidson. Learning latent dynamics for planning from pixels, 2019. URLhttps://arxiv.org/abs/1811.04551

work page internal anchor Pith review Pith/arXiv arXiv 2019
[5]

Dream to Control: Learning Behaviors by Latent Imagination

D. Hafner, T. Lillicrap, J. Ba, and M. Norouzi. Dream to control: Learning behaviors by latent imagination. InInternational Conference on Learning Representations, 2020. URL https://arxiv.org/abs/1912.01603

work page internal anchor Pith review Pith/arXiv arXiv 2020
[6]

RT-1: Robotics Transformer for Real-World Control at Scale

A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Haus- man, A. Herzog, J. Hsu, et al. RT-1: Robotics transformer for real-world control at scale, 2022. URLhttps://arxiv.org/abs/2212.06817

work page internal anchor Pith review Pith/arXiv arXiv 2022
[7]

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, X. Chen, K. Choromanski, T. Ding, D. Driess, A. Dubey, C. Finn, et al. RT-2: Vision-language-action models transfer web knowledge to robotic control, 2023. URLhttps://arxiv.org/abs/2307.15818

work page internal anchor Pith review Pith/arXiv arXiv 2023
[8]

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Open X-Embodiment Collaboration. Open X-embodiment: Robotic learning datasets and RT- X models, 2023. URLhttps://arxiv.org/abs/2310.08864

work page internal anchor Pith review Pith/arXiv arXiv 2023
[9]

Octo: An Open-Source Generalist Robot Policy

Octo Model Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, T. Kreiman, C. Xu, et al. Octo: An open-source generalist robot policy, 2024. URLhttps: //arxiv.org/abs/2405.12213

work page internal anchor Pith review Pith/arXiv arXiv 2024
[10]

M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, et al. OpenVLA: An open-source vision-language-action model, 2024. URLhttps://arxiv.org/abs/2406.09246

work page internal anchor Pith review Pith/arXiv arXiv 2024
[11]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichter, et al.π 0: A vision-language-action flow model for general robot control, 2024. URL https://arxiv.org/abs/2410.24164

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

M. Shukor, D. Aubakirova, F. Capuano, P. Kooijmans, S. Palma, A. Zouitine, M. Aractingi, C. Pascal, M. Russi, A. Marafioti, S. Alibert, M. Cord, T. Wolf, and R. Cadene. SmolVLA: A vision-language-action model for affordable and efficient robotics, 2025. URLhttps: //arxiv.org/abs/2506.01844

work page internal anchor Pith review Pith/arXiv arXiv 2025
[13]

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

NVIDIA, J. Bjorck, F. Casta ˜neda, N. Cherniadev, X. Da, R. Ding, L. Fan, et al. GR00T N1: An open foundation model for generalist humanoid robots, 2025. URLhttps://arxiv.org/ abs/2503.14734. 9 Can Predicted Dynamics Exist in the Physical World?Dr. Barak Or

work page internal anchor Pith review Pith/arXiv arXiv 2025
[14]

Cadene, S

R. Cadene, S. Aliberts, F. Capuano, M. Aractingi, A. Zouitine, P. Kooijmans, J. Choghari, M. Russi, C. Pascal, S. Palma, M. Shukor, J. Moss, A. Soare, D. Aubakirova, Q. Lhoest, Q. Gallou´edec, and T. Wolf. LeRobot: An open-source library for end-to-end robot learning,
[15]

URLhttps://arxiv.org/abs/2602.22818

work page arXiv
[16]

Kawaharazuka, J

K. Kawaharazuka, J. Oh, J. Yamada, I. Posner, and Y . Zhu. Vision-language-action models for robotics: A review towards real-world applications, 2025. URLhttps://arxiv.org/abs/ 2510.07077

work page arXiv 2025
[17]

James, Z

S. James, Z. Ma, D. R. Arrojo, and A. J. Davison. RLBench: The robot learning benchmark and learning environment.IEEE Robotics and Automation Letters, 5(2):3019–3026, 2020. doi:10.1109/LRA.2020.2974707. URLhttps://doi.org/10.1109/LRA.2020.2974707

work page doi:10.1109/lra.2020.2974707 2020
[18]

T. Yu, D. Quillen, Z. He, R. Julian, A. Narayan, H. Shively, A. Bellathur, K. Hausman, C. Finn, and S. Levine. Meta-World: A benchmark and evaluation for multi-task and meta reinforce- ment learning. InProceedings of the Conference on Robot Learning, pages 1094–1100, 2020. URLhttps://proceedings.mlr.press/v100/yu20a.html

2020
[19]

O. Mees, L. Hermann, E. Rosete-Beas, and W. Burgard. CALVIN: A benchmark for language- conditioned policy learning for long-horizon robot manipulation tasks, 2021. URLhttps: //arxiv.org/abs/2112.03227

work page arXiv 2021
[20]

B. Liu, Y . Zhu, C. Gao, Y . Feng, Q. Liu, Y . Zhu, and P. Stone. LIBERO: Benchmarking knowledge transfer for lifelong robot learning, 2023. URLhttps://arxiv.org/abs/2306. 03310

2023
[21]

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

A. Khazatsky, K. Pertsch, S. Nair, A. Balakrishna, S. Dasari, S. Karamcheti, S. Nasiriany, et al. DROID: A large-scale in-the-wild robot manipulation dataset, 2024. URLhttps: //arxiv.org/abs/2403.12945

work page internal anchor Pith review Pith/arXiv arXiv 2024
[22]

What Matters in Learning from Offline Human Demonstrations for Robot Manipulation

A. Mandlekar, D. Xu, J. Wong, S. Nasiriany, C. Wang, R. Kulkarni, L. Fei-Fei, S. Savarese, Y . Zhu, and R. Mart´ın-Mart´ın. What matters in learning from offline human demonstrations for robot manipulation. InProceedings of the Conference on Robot Learning, 2021. URL https://arxiv.org/abs/2108.03298

work page internal anchor Pith review Pith/arXiv arXiv 2021
[23]

N. M. M. Shafiullah, Z. J. Cui, A. Altanzaya, and L. Pinto. Behavior transformers: Cloning kmodes with one stone. InAdvances in Neural Information Processing Systems, 2022. URL https://arxiv.org/abs/2206.11251

work page arXiv 2022
[24]

C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song. Diffusion policy: Visuomotor policy learning via action diffusion, 2023. URLhttps://arxiv.org/ abs/2303.04137

work page internal anchor Pith review Pith/arXiv arXiv 2023
[25]

T. Z. Zhao, V . Kumar, S. Levine, and C. Finn. Learning fine-grained bimanual manipulation with low-cost hardware, 2023. URLhttps://arxiv.org/abs/2304.13705

work page internal anchor Pith review Pith/arXiv arXiv 2023
[26]

Mastering Diverse Domains through World Models

D. Hafner, J. Pasukonis, J. Ba, and T. Lillicrap. Mastering diverse domains through world models.arXiv preprint arXiv:2301.04104, 2023. URLhttps://arxiv.org/abs/2301. 04104

work page internal anchor Pith review Pith/arXiv arXiv 2023
[27]

Lillicrap, and David Silver

J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lock- hart, D. Hassabis, T. Graepel, et al. Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020. doi:10.1038/s41586-020-03051-4. URL https://doi.org/10.1038/s41586-020-03051-4

work page internal anchor Pith review doi:10.1038/s41586-020-03051-4 2020
[28]

doi: https://doi.org/10.1016/j.jcp.2018.10.045

M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear par- tial differential equations.Journal of Computational Physics, 378:686–707, 2019. doi: 10.1016/j.jcp.2018.10.045. URLhttps://doi.org/10.1016/j.jcp.2018.10.045. 10 Can Predicted Dynami...

work page doi:10.1016/j.jcp.2018.10.045 2019
[29]

Available: https://arxiv.org/abs/1906.01563

S. Greydanus, M. Dzamba, and J. Yosinski. Hamiltonian neural networks, 2019. URLhttps: //arxiv.org/abs/1906.01563

work page arXiv 2019
[30]

Cranmer, S

M. Cranmer, S. Greydanus, S. Hoyer, P. Battaglia, D. Spergel, and S. Ho. Lagrangian neural networks, 2020. URLhttps://arxiv.org/abs/2003.04630

work page arXiv 2020
[31]

Sanchez-Gonzalez, J

A. Sanchez-Gonzalez, J. Godwin, T. Pfaff, R. Ying, J. Leskovec, and P. W. Battaglia. Learning to simulate complex physics with graph networks. InProceedings of the 37th International Conference on Machine Learning, pages 8459–8468, 2020. URLhttps://proceedings. mlr.press/v119/sanchez-gonzalez20a.html

2020
[32]

Qureshi, Anthony Simeonov, Mayur J

C. Finn and S. Levine. Deep visual foresight for planning robot motion. In2017 IEEE Interna- tional Conference on Robotics and Automation, pages 2786–2793, 2017. doi:10.1109/ICRA. 2017.7989324. URLhttps://arxiv.org/abs/1610.00696

work page doi:10.1109/icra 2017
[33]

Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control

F. Ebert, C. Finn, S. Dasari, A. Xie, A. Lee, and S. Levine. Visual foresight: Model-based deep reinforcement learning for vision-based robotic control. InProceedings of the Conference on Robot Learning, 2018. URLhttps://arxiv.org/abs/1812.00568

work page internal anchor Pith review Pith/arXiv arXiv 2018
[34]

Garc ´ıa and F

J. Garc ´ıa and F. Fern´andez. A comprehensive survey on safe reinforcement learning.Journal of Machine Learning Research, 16(1):1437–1480, 2015. URLhttps://jmlr.org/papers/ v16/garcia15a.html

2015
[35]

Constrained Policy Optimization

J. Achiam, D. Held, A. Tamar, and P. Abbeel. Constrained policy optimization. InProceedings of the 34th International Conference on Machine Learning, pages 22–31, 2017. URLhttps: //arxiv.org/abs/1705.10528

work page internal anchor Pith review Pith/arXiv arXiv 2017
[36]

J. F. Fisac, A. K. Akametalu, M. N. Zeilinger, S. Kaynama, J. Gillula, and C. J. Tomlin. Bridg- ing hamilton-jacobi safety analysis and reinforcement learning. In2019 International Confer- ence on Robotics and Automation, pages 8550–8556, 2019. doi:10.1109/ICRA.2019.8794107. URLhttps://doi.org/10.1109/ICRA.2019.8794107

work page doi:10.1109/icra.2019.8794107 2019
[37]

I. M. Mitchell, A. M. Bayen, and C. J. Tomlin. A time-dependent Hamilton-Jacobi formulation of reachable sets for continuous dynamic games.IEEE Transactions on Automatic Control, 50(7):947–957, 2005. doi:10.1109/TAC.2005.851439. URLhttps://doi.org/10.1109/ TAC.2005.851439

work page doi:10.1109/tac.2005.851439 2005
[38]

A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada. Control barrier functions: Theory and applications. In2019 18th European Control Conference, pages 3420–3431, 2019. doi:10.23919/ECC.2019.8796030. URLhttps://doi.org/10.23919/ ECC.2019.8796030

work page doi:10.23919/ecc.2019.8796030 2019
[39]

K. P. Wabersich and M. N. Zeilinger. A predictive safety filter for learning-based con- trol of constrained nonlinear dynamical systems.Automatica, 129:109597, 2021. doi:10. 1016/j.automatica.2021.109597. URLhttps://doi.org/10.1016/j.automatica.2021. 109597

work page doi:10.1016/j.automatica.2021 2021
[40]

K.-C. Hsu, H. Hu, and J. F. Fisac. The safety filter: A unified view of safety-critical control in autonomous systems.Annual Review of Control, Robotics, and Autonomous Systems, 7: 47–72, 2024. doi:10.1146/annurev-control-071723-102940. URLhttps://doi.org/10. 1146/annurev-control-071723-102940

work page doi:10.1146/annurev-control-071723-102940 2024
[41]

D. Seto, B. H. Krogh, L. Sha, and A. Chutinan. The simplex architecture for safe online control system upgrades. InProceedings of the 1998 American Control Conference, pages 3504–3508, 1998. doi:10.1109/ACC.1998.703255. URLhttps://doi.org/10.1109/ACC. 1998.703255. 11 Can Predicted Dynamics Exist in the Physical World?Dr. Barak Or

work page doi:10.1109/acc.1998.703255 1998
[42]

Paul Blain Levy.Call-By-Push-Value: A Functional/Imperative Synthesis

M. Leucker and C. Schallhart. A brief account of runtime verification.The Journal of Logic and Algebraic Programming, 78(5):293–303, 2009. doi:10.1016/j.jlap.2008.08.004. URL https://doi.org/10.1016/j.jlap.2008.08.004

work page doi:10.1016/j.jlap.2008.08.004 2009
[43]

Safe Reinforcement Learning via Shielding

M. Alshiekh, R. Bloem, R. Ehlers, B. K ¨onighofer, S. Niekum, and U. Topcu. Safe reinforce- ment learning via shielding. InProceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018. URLhttps://arxiv.org/abs/1708.08611

work page internal anchor Pith review Pith/arXiv arXiv 2018
[44]

K. L. Hobbs, M. L. Mote, M. Abate, S. Coogan, and E. Feron. Run time assurance for safety- critical systems: An introduction to safety filtering approaches for complex control systems. IEEE Control Systems Magazine, 43(2):28–65, 2023. doi:10.1109/MCS.2023.3234380. URL https://doi.org/10.1109/MCS.2023.3234380

work page doi:10.1109/mcs.2023.3234380 2023
[45]

C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger. On calibration of modern neural networks. Proceedings of the 34th International Conference on Machine Learning, pages 1321–1330,
[46]

URLhttps://arxiv.org/abs/1706.04599

work page internal anchor Pith review Pith/arXiv arXiv
[47]

Lakshminarayanan, A

B. Lakshminarayanan, A. Pritzel, and C. Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles. InAdvances in Neural Information Processing Systems,
[48]

URLhttps://arxiv.org/abs/1612.01474

work page internal anchor Pith review Pith/arXiv arXiv
[49]

G. Katz, C. Barrett, D. L. Dill, K. Julian, and M. J. Kochenderfer. Reluplex: An efficient SMT solver for verifying deep neural networks. InInternational Conference on Computer Aided Verification, pages 97–117. Springer, 2017. doi:10.1007/978-3-319-63387-9 5. URL https://doi.org/10.1007/978-3-319-63387-9_5

work page doi:10.1007/978-3-319-63387-9 2017
[50]

T. Gehr, M. Mirman, D. Drachsler-Cohen, P. Tsankov, S. Chaudhuri, and M. Vechev. AI2: Safety and robustness certification of neural networks with abstract interpretation. In2018 IEEE Symposium on Security and Privacy, pages 3–18, 2018. doi:10.1109/SP.2018.00058. URLhttps://doi.org/10.1109/SP.2018.00058

work page doi:10.1109/sp.2018.00058 2018
[51]

Ivanov, J

R. Ivanov, J. Weimer, R. Alur, G. J. Pappas, and I. Lee. Verisig: Verifying safety properties of hybrid systems with neural network controllers. InProceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control, pages 169–178, 2019. doi:10.1145/ 3302504.3311806. URLhttps://arxiv.org/abs/1811.01828. Appendix A Proof of Propo...

work page arXiv 2019

[1] [1]

World Models

D. Ha and J. Schmidhuber. World models.arXiv preprint arXiv:1803.10122, 2018. URL https://arxiv.org/abs/1803.10122

work page internal anchor Pith review Pith/arXiv arXiv 2018

[2] [2]

K. Chua, R. Calandra, R. McAllister, and S. Levine. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. InAdvances in Neural Information Processing Systems, 2018. URLhttps://arxiv.org/abs/1805.12114

work page internal anchor Pith review Pith/arXiv arXiv 2018

[3] [3]

Janner, J

M. Janner, J. Fu, M. Zhang, and S. Levine. When to trust your model: Model-based policy optimization. InAdvances in Neural Information Processing Systems, 2019. URLhttps: //arxiv.org/abs/1906.08253

work page arXiv 2019

[4] [4]

Learning Latent Dynamics for Planning from Pixels

D. Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, and J. Davidson. Learning latent dynamics for planning from pixels, 2019. URLhttps://arxiv.org/abs/1811.04551

work page internal anchor Pith review Pith/arXiv arXiv 2019

[5] [5]

Dream to Control: Learning Behaviors by Latent Imagination

D. Hafner, T. Lillicrap, J. Ba, and M. Norouzi. Dream to control: Learning behaviors by latent imagination. InInternational Conference on Learning Representations, 2020. URL https://arxiv.org/abs/1912.01603

work page internal anchor Pith review Pith/arXiv arXiv 2020

[6] [6]

RT-1: Robotics Transformer for Real-World Control at Scale

A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Haus- man, A. Herzog, J. Hsu, et al. RT-1: Robotics transformer for real-world control at scale, 2022. URLhttps://arxiv.org/abs/2212.06817

work page internal anchor Pith review Pith/arXiv arXiv 2022

[7] [7]

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, X. Chen, K. Choromanski, T. Ding, D. Driess, A. Dubey, C. Finn, et al. RT-2: Vision-language-action models transfer web knowledge to robotic control, 2023. URLhttps://arxiv.org/abs/2307.15818

work page internal anchor Pith review Pith/arXiv arXiv 2023

[8] [8]

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Open X-Embodiment Collaboration. Open X-embodiment: Robotic learning datasets and RT- X models, 2023. URLhttps://arxiv.org/abs/2310.08864

work page internal anchor Pith review Pith/arXiv arXiv 2023

[9] [9]

Octo: An Open-Source Generalist Robot Policy

Octo Model Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, T. Kreiman, C. Xu, et al. Octo: An open-source generalist robot policy, 2024. URLhttps: //arxiv.org/abs/2405.12213

work page internal anchor Pith review Pith/arXiv arXiv 2024

[10] [10]

M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, et al. OpenVLA: An open-source vision-language-action model, 2024. URLhttps://arxiv.org/abs/2406.09246

work page internal anchor Pith review Pith/arXiv arXiv 2024

[11] [11]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichter, et al.π 0: A vision-language-action flow model for general robot control, 2024. URL https://arxiv.org/abs/2410.24164

work page internal anchor Pith review Pith/arXiv arXiv 2024

[12] [12]

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

M. Shukor, D. Aubakirova, F. Capuano, P. Kooijmans, S. Palma, A. Zouitine, M. Aractingi, C. Pascal, M. Russi, A. Marafioti, S. Alibert, M. Cord, T. Wolf, and R. Cadene. SmolVLA: A vision-language-action model for affordable and efficient robotics, 2025. URLhttps: //arxiv.org/abs/2506.01844

work page internal anchor Pith review Pith/arXiv arXiv 2025

[13] [13]

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

NVIDIA, J. Bjorck, F. Casta ˜neda, N. Cherniadev, X. Da, R. Ding, L. Fan, et al. GR00T N1: An open foundation model for generalist humanoid robots, 2025. URLhttps://arxiv.org/ abs/2503.14734. 9 Can Predicted Dynamics Exist in the Physical World?Dr. Barak Or

work page internal anchor Pith review Pith/arXiv arXiv 2025

[14] [14]

Cadene, S

R. Cadene, S. Aliberts, F. Capuano, M. Aractingi, A. Zouitine, P. Kooijmans, J. Choghari, M. Russi, C. Pascal, S. Palma, M. Shukor, J. Moss, A. Soare, D. Aubakirova, Q. Lhoest, Q. Gallou´edec, and T. Wolf. LeRobot: An open-source library for end-to-end robot learning,

[15] [15]

URLhttps://arxiv.org/abs/2602.22818

work page arXiv

[16] [16]

Kawaharazuka, J

K. Kawaharazuka, J. Oh, J. Yamada, I. Posner, and Y . Zhu. Vision-language-action models for robotics: A review towards real-world applications, 2025. URLhttps://arxiv.org/abs/ 2510.07077

work page arXiv 2025

[17] [17]

James, Z

S. James, Z. Ma, D. R. Arrojo, and A. J. Davison. RLBench: The robot learning benchmark and learning environment.IEEE Robotics and Automation Letters, 5(2):3019–3026, 2020. doi:10.1109/LRA.2020.2974707. URLhttps://doi.org/10.1109/LRA.2020.2974707

work page doi:10.1109/lra.2020.2974707 2020

[18] [18]

T. Yu, D. Quillen, Z. He, R. Julian, A. Narayan, H. Shively, A. Bellathur, K. Hausman, C. Finn, and S. Levine. Meta-World: A benchmark and evaluation for multi-task and meta reinforce- ment learning. InProceedings of the Conference on Robot Learning, pages 1094–1100, 2020. URLhttps://proceedings.mlr.press/v100/yu20a.html

2020

[19] [19]

O. Mees, L. Hermann, E. Rosete-Beas, and W. Burgard. CALVIN: A benchmark for language- conditioned policy learning for long-horizon robot manipulation tasks, 2021. URLhttps: //arxiv.org/abs/2112.03227

work page arXiv 2021

[20] [20]

B. Liu, Y . Zhu, C. Gao, Y . Feng, Q. Liu, Y . Zhu, and P. Stone. LIBERO: Benchmarking knowledge transfer for lifelong robot learning, 2023. URLhttps://arxiv.org/abs/2306. 03310

2023

[21] [21]

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

A. Khazatsky, K. Pertsch, S. Nair, A. Balakrishna, S. Dasari, S. Karamcheti, S. Nasiriany, et al. DROID: A large-scale in-the-wild robot manipulation dataset, 2024. URLhttps: //arxiv.org/abs/2403.12945

work page internal anchor Pith review Pith/arXiv arXiv 2024

[22] [22]

What Matters in Learning from Offline Human Demonstrations for Robot Manipulation

A. Mandlekar, D. Xu, J. Wong, S. Nasiriany, C. Wang, R. Kulkarni, L. Fei-Fei, S. Savarese, Y . Zhu, and R. Mart´ın-Mart´ın. What matters in learning from offline human demonstrations for robot manipulation. InProceedings of the Conference on Robot Learning, 2021. URL https://arxiv.org/abs/2108.03298

work page internal anchor Pith review Pith/arXiv arXiv 2021

[23] [23]

N. M. M. Shafiullah, Z. J. Cui, A. Altanzaya, and L. Pinto. Behavior transformers: Cloning kmodes with one stone. InAdvances in Neural Information Processing Systems, 2022. URL https://arxiv.org/abs/2206.11251

work page arXiv 2022

[24] [24]

C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song. Diffusion policy: Visuomotor policy learning via action diffusion, 2023. URLhttps://arxiv.org/ abs/2303.04137

work page internal anchor Pith review Pith/arXiv arXiv 2023

[25] [25]

T. Z. Zhao, V . Kumar, S. Levine, and C. Finn. Learning fine-grained bimanual manipulation with low-cost hardware, 2023. URLhttps://arxiv.org/abs/2304.13705

work page internal anchor Pith review Pith/arXiv arXiv 2023

[26] [26]

Mastering Diverse Domains through World Models

D. Hafner, J. Pasukonis, J. Ba, and T. Lillicrap. Mastering diverse domains through world models.arXiv preprint arXiv:2301.04104, 2023. URLhttps://arxiv.org/abs/2301. 04104

work page internal anchor Pith review Pith/arXiv arXiv 2023

[27] [27]

Lillicrap, and David Silver

J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lock- hart, D. Hassabis, T. Graepel, et al. Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020. doi:10.1038/s41586-020-03051-4. URL https://doi.org/10.1038/s41586-020-03051-4

work page internal anchor Pith review doi:10.1038/s41586-020-03051-4 2020

[28] [28]

doi: https://doi.org/10.1016/j.jcp.2018.10.045

M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear par- tial differential equations.Journal of Computational Physics, 378:686–707, 2019. doi: 10.1016/j.jcp.2018.10.045. URLhttps://doi.org/10.1016/j.jcp.2018.10.045. 10 Can Predicted Dynami...

work page doi:10.1016/j.jcp.2018.10.045 2019

[29] [29]

Available: https://arxiv.org/abs/1906.01563

S. Greydanus, M. Dzamba, and J. Yosinski. Hamiltonian neural networks, 2019. URLhttps: //arxiv.org/abs/1906.01563

work page arXiv 2019

[30] [30]

Cranmer, S

M. Cranmer, S. Greydanus, S. Hoyer, P. Battaglia, D. Spergel, and S. Ho. Lagrangian neural networks, 2020. URLhttps://arxiv.org/abs/2003.04630

work page arXiv 2020

[31] [31]

Sanchez-Gonzalez, J

A. Sanchez-Gonzalez, J. Godwin, T. Pfaff, R. Ying, J. Leskovec, and P. W. Battaglia. Learning to simulate complex physics with graph networks. InProceedings of the 37th International Conference on Machine Learning, pages 8459–8468, 2020. URLhttps://proceedings. mlr.press/v119/sanchez-gonzalez20a.html

2020

[32] [32]

Qureshi, Anthony Simeonov, Mayur J

C. Finn and S. Levine. Deep visual foresight for planning robot motion. In2017 IEEE Interna- tional Conference on Robotics and Automation, pages 2786–2793, 2017. doi:10.1109/ICRA. 2017.7989324. URLhttps://arxiv.org/abs/1610.00696

work page doi:10.1109/icra 2017

[33] [33]

Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control

F. Ebert, C. Finn, S. Dasari, A. Xie, A. Lee, and S. Levine. Visual foresight: Model-based deep reinforcement learning for vision-based robotic control. InProceedings of the Conference on Robot Learning, 2018. URLhttps://arxiv.org/abs/1812.00568

work page internal anchor Pith review Pith/arXiv arXiv 2018

[34] [34]

Garc ´ıa and F

J. Garc ´ıa and F. Fern´andez. A comprehensive survey on safe reinforcement learning.Journal of Machine Learning Research, 16(1):1437–1480, 2015. URLhttps://jmlr.org/papers/ v16/garcia15a.html

2015

[35] [35]

Constrained Policy Optimization

J. Achiam, D. Held, A. Tamar, and P. Abbeel. Constrained policy optimization. InProceedings of the 34th International Conference on Machine Learning, pages 22–31, 2017. URLhttps: //arxiv.org/abs/1705.10528

work page internal anchor Pith review Pith/arXiv arXiv 2017

[36] [36]

J. F. Fisac, A. K. Akametalu, M. N. Zeilinger, S. Kaynama, J. Gillula, and C. J. Tomlin. Bridg- ing hamilton-jacobi safety analysis and reinforcement learning. In2019 International Confer- ence on Robotics and Automation, pages 8550–8556, 2019. doi:10.1109/ICRA.2019.8794107. URLhttps://doi.org/10.1109/ICRA.2019.8794107

work page doi:10.1109/icra.2019.8794107 2019

[37] [37]

I. M. Mitchell, A. M. Bayen, and C. J. Tomlin. A time-dependent Hamilton-Jacobi formulation of reachable sets for continuous dynamic games.IEEE Transactions on Automatic Control, 50(7):947–957, 2005. doi:10.1109/TAC.2005.851439. URLhttps://doi.org/10.1109/ TAC.2005.851439

work page doi:10.1109/tac.2005.851439 2005

[38] [38]

A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada. Control barrier functions: Theory and applications. In2019 18th European Control Conference, pages 3420–3431, 2019. doi:10.23919/ECC.2019.8796030. URLhttps://doi.org/10.23919/ ECC.2019.8796030

work page doi:10.23919/ecc.2019.8796030 2019

[39] [39]

K. P. Wabersich and M. N. Zeilinger. A predictive safety filter for learning-based con- trol of constrained nonlinear dynamical systems.Automatica, 129:109597, 2021. doi:10. 1016/j.automatica.2021.109597. URLhttps://doi.org/10.1016/j.automatica.2021. 109597

work page doi:10.1016/j.automatica.2021 2021

[40] [40]

K.-C. Hsu, H. Hu, and J. F. Fisac. The safety filter: A unified view of safety-critical control in autonomous systems.Annual Review of Control, Robotics, and Autonomous Systems, 7: 47–72, 2024. doi:10.1146/annurev-control-071723-102940. URLhttps://doi.org/10. 1146/annurev-control-071723-102940

work page doi:10.1146/annurev-control-071723-102940 2024

[41] [41]

D. Seto, B. H. Krogh, L. Sha, and A. Chutinan. The simplex architecture for safe online control system upgrades. InProceedings of the 1998 American Control Conference, pages 3504–3508, 1998. doi:10.1109/ACC.1998.703255. URLhttps://doi.org/10.1109/ACC. 1998.703255. 11 Can Predicted Dynamics Exist in the Physical World?Dr. Barak Or

work page doi:10.1109/acc.1998.703255 1998

[42] [42]

Paul Blain Levy.Call-By-Push-Value: A Functional/Imperative Synthesis

M. Leucker and C. Schallhart. A brief account of runtime verification.The Journal of Logic and Algebraic Programming, 78(5):293–303, 2009. doi:10.1016/j.jlap.2008.08.004. URL https://doi.org/10.1016/j.jlap.2008.08.004

work page doi:10.1016/j.jlap.2008.08.004 2009

[43] [43]

Safe Reinforcement Learning via Shielding

M. Alshiekh, R. Bloem, R. Ehlers, B. K ¨onighofer, S. Niekum, and U. Topcu. Safe reinforce- ment learning via shielding. InProceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018. URLhttps://arxiv.org/abs/1708.08611

work page internal anchor Pith review Pith/arXiv arXiv 2018

[44] [44]

K. L. Hobbs, M. L. Mote, M. Abate, S. Coogan, and E. Feron. Run time assurance for safety- critical systems: An introduction to safety filtering approaches for complex control systems. IEEE Control Systems Magazine, 43(2):28–65, 2023. doi:10.1109/MCS.2023.3234380. URL https://doi.org/10.1109/MCS.2023.3234380

work page doi:10.1109/mcs.2023.3234380 2023

[45] [45]

C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger. On calibration of modern neural networks. Proceedings of the 34th International Conference on Machine Learning, pages 1321–1330,

[46] [46]

URLhttps://arxiv.org/abs/1706.04599

work page internal anchor Pith review Pith/arXiv arXiv

[47] [47]

Lakshminarayanan, A

B. Lakshminarayanan, A. Pritzel, and C. Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles. InAdvances in Neural Information Processing Systems,

[48] [48]

URLhttps://arxiv.org/abs/1612.01474

work page internal anchor Pith review Pith/arXiv arXiv

[49] [49]

G. Katz, C. Barrett, D. L. Dill, K. Julian, and M. J. Kochenderfer. Reluplex: An efficient SMT solver for verifying deep neural networks. InInternational Conference on Computer Aided Verification, pages 97–117. Springer, 2017. doi:10.1007/978-3-319-63387-9 5. URL https://doi.org/10.1007/978-3-319-63387-9_5

work page doi:10.1007/978-3-319-63387-9 2017

[50] [50]

T. Gehr, M. Mirman, D. Drachsler-Cohen, P. Tsankov, S. Chaudhuri, and M. Vechev. AI2: Safety and robustness certification of neural networks with abstract interpretation. In2018 IEEE Symposium on Security and Privacy, pages 3–18, 2018. doi:10.1109/SP.2018.00058. URLhttps://doi.org/10.1109/SP.2018.00058

work page doi:10.1109/sp.2018.00058 2018

[51] [51]

Ivanov, J

R. Ivanov, J. Weimer, R. Alur, G. J. Pappas, and I. Lee. Verisig: Verifying safety properties of hybrid systems with neural network controllers. InProceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control, pages 169–178, 2019. doi:10.1145/ 3302504.3311806. URLhttps://arxiv.org/abs/1811.01828. Appendix A Proof of Propo...

work page arXiv 2019