pith. sign in

arxiv: 2412.12870 · v6 · submitted 2024-12-17 · 💻 cs.LG

Physically Interpretable World Models via Weakly Supervised Representation Learning

Pith reviewed 2026-05-23 06:58 UTC · model grok-4.3

classification 💻 cs.LG
keywords world modelsrepresentation learningweak supervisionphysical interpretabilitylatent dynamicscyber-physical systemsimage-based prediction
0
0 comments X

The pith

PIWM learns world models whose latent states correspond to physical variables and evolve according to known dynamics using only weak distribution-based supervision from images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PIWM as a way to give world models physical interpretability by making their latent states match real physical quantities and their changes obey partially known physical equations. It achieves this without ground-truth annotations by applying weak supervision that reflects the natural uncertainty in real sensing. A sympathetic reader would care because standard learned world models produce opaque latents that hinder reliability in safety-critical control tasks. The method combines a vector-quantized visual encoder, a transformer physical encoder, and a dynamics model tied to known equations, and it is tested on three image-based control environments. If the approach works, it shows that partial physical knowledge plus distributional supervision can ground representations without full labels.

Core claim

PIWM defines physical interpretability as latent states that match meaningful physical variables and temporal evolution that follows physically consistent dynamics; it realizes this through weak distribution-based supervision on state uncertainty together with a VQ visual encoder, transformer physical encoder, and learnable dynamics grounded in known equations, yielding accurate long-horizon prediction and recovery of true system parameters on Cart Pole, Lunar Lander, and Donkey Car without ground-truth annotations.

What carries the argument

The PIWM architecture that integrates weak distribution-based supervision with a VQ-based visual encoder, transformer-based physical encoder, and dynamics model constrained by known physical equations to enforce alignment and consistency.

If this is right

  • Accurate long-horizon prediction becomes possible directly from raw images.
  • True system parameters can be recovered without explicit annotations.
  • Physical grounding improves markedly compared with purely data-driven world models.
  • The same weak-supervision approach works across multiple distinct control domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method may reduce the data requirements for training reliable controllers in new physical environments.
  • Partial knowledge of dynamics equations could be combined with this supervision to handle partially observed or noisy real-world sensors.
  • If the alignment holds, the resulting models could support verification or safety analysis that treats the latent states as interpretable physical quantities.

Load-bearing premise

Weak distribution-based supervision derived from sensing uncertainty is enough to force latent states to align with actual physical variables and to keep their dynamics physically consistent.

What would settle it

On a held-out image-based control task the trained PIWM model produces long-horizon rollouts whose recovered parameters deviate substantially from the known ground-truth values or whose predicted trajectories violate the known physical constraints.

Figures

Figures reproduced from arXiv: 2412.12870 by Ivan Ruchkin, Mrinall Eashaan Umasudhan, Zhenjiang Mao.

Figure 1
Figure 1. Figure 1: Conceptual illustration of learning physically mean [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of world model architectures. (Left) Standard world model uses an encoder-decoder structure with a [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Prediction performance of CartPole. The Root Mean [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Learned physical parameters vs. ground truth. For Cart Pole and Lunar Lander, the parameters learned by our [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative comparison of 30-step trajectory rollouts under [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Extrinsic–discrete PIWM architecture. The vision encoder [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative comparison of 30-step trajectory roll [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
read the original abstract

Learning predictive models from high-dimensional sensory observations is fundamental for cyber-physical systems, yet the latent representations learned by standard world models lack physical interpretability. This limits their reliability, generalizability, and applicability to safety-critical tasks. We introduce Physically Interpretable World Models (PIWM), a framework that aligns latent representations with real-world physical quantities and constrains their evolution through partially known physical dynamics. Physical interpretability in PIWM is defined by two complementary properties: (i) the learned latent state corresponds to meaningful physical variables, and (ii) its temporal evolution follows physically consistent dynamics. To achieve this without requiring ground-truth physical annotations, PIWM employs weak distribution-based supervision that captures state uncertainty naturally arising from real-world sensing pipelines. The architecture integrates a VQ-based visual encoder, a transformer-based physical encoder, and a learnable dynamics model grounded in known physical equations. Across three case studies (Cart Pole, Lunar Lander, and Donkey Car), PIWM achieves accurate long-horizon prediction, recovers true system parameters, and significantly improves physical grounding over purely data-driven models. These results demonstrate the feasibility and advantages of learning physically interpretable world models directly from images under weak supervision.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript introduces Physically Interpretable World Models (PIWM), a framework that learns latent representations from high-dimensional images aligned with physical variables and whose dynamics follow partially known physical equations. Physical interpretability is defined via two properties: correspondence of latents to meaningful physical variables and physically consistent temporal evolution. The method uses weak distribution-based supervision (capturing sensing uncertainty) together with a VQ visual encoder, transformer physical encoder, and learnable dynamics model. On Cart Pole, Lunar Lander, and Donkey Car, the paper claims accurate long-horizon prediction, recovery of true system parameters, and improved physical grounding relative to purely data-driven baselines.

Significance. If substantiated, the approach would offer a practical route to interpretable world models for cyber-physical systems without requiring ground-truth state annotations. The reliance on weak distributional supervision and partial physics knowledge is a potentially valuable contribution for real-world sensing pipelines, and successful parameter recovery would strengthen the case for using such models in safety-critical settings.

major comments (1)
  1. [Abstract] Abstract: the claims of accurate long-horizon prediction, true parameter recovery, and significantly improved physical grounding are stated without any quantitative metrics, ablation results, or experimental protocol details, preventing evaluation of whether the central claims hold.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The single major comment concerns the abstract's lack of quantitative support for its claims. We address this directly below and will revise the abstract accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claims of accurate long-horizon prediction, true parameter recovery, and significantly improved physical grounding are stated without any quantitative metrics, ablation results, or experimental protocol details, preventing evaluation of whether the central claims hold.

    Authors: We agree that the abstract would be strengthened by including representative quantitative results. The full manuscript reports these details in Sections 4 and 5 (e.g., long-horizon MSE on Cart Pole/Lunar Lander/Donkey Car, parameter recovery errors with standard deviations, and ablation studies comparing against data-driven baselines). To make the abstract self-contained, we will revise it to incorporate key metrics such as prediction accuracy over 50-step horizons and parameter estimation errors, while retaining the high-level summary. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The derivation chain integrates externally known physical equations into the dynamics model and applies weak distribution-based supervision to align latents, without any reduction of predictions or parameter recovery to the supervision signal by construction. The VQ/transformer architecture and case-study verification against independent ground truth remain independent of the fitted values. No self-citation load-bearing steps, self-definitional quantities, or ansatz smuggling appear in the abstract or described framework.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review prevents exhaustive enumeration; the approach assumes standard components (VQ encoder, transformer) and domain knowledge of physical equations.

axioms (1)
  • domain assumption Partially known physical dynamics can be integrated into a learnable model to constrain latent evolution
    Stated in the description of the dynamics model grounded in known physical equations.

pith-pipeline@v0.9.0 · 5747 in / 1207 out tokens · 27510 ms · 2026-05-23T06:58:05.450907+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 9 internal anchors

  1. [1]

    Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. 2018. Safe reinforcement learning via shielding. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32

  2. [2]

    Martin Asenov, Michael Burke, Daniel Angelov, Todor Davchev, Kartic Subr, and Subramanian Ramamoorthy. 2019. Vid2param: Modeling of dynamics parameters from video.IEEE Robotics and Automation Letters5, 2 (2019), 414–421

  3. [3]

    Shaojie Bai, J Zico Kolter, and Vladlen Koltun. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling.arXiv preprint arXiv:1803.01271(2018)

  4. [4]

    Jonathan Balloch, Zhiyu Lin, Robert Wright, Xiangyu Peng, Mustafa Hussain, Aarun Srinivas, Julia Kim, and Mark O Riedl. 2023. Neuro-symbolic world models for adapting to open world novelty.arXiv preprint arXiv:2301.06294(2023)

  5. [5]

    Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schul- man, Jie Tang, and Wojciech Zaremba. 2016. Openai gym.arXiv preprint arXiv:1606.01540(2016)

  6. [6]

    Brunton, Joshua L

    Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. 2016. Sparse Identifi- cation of Nonlinear Dynamics with Control (SINDYc).IFAC-PapersOnLine49, 18 (2016), 710–715. doi:10.1016/j.ifacol.2016.10.249 10th IFAC Symposium on Nonlinear Control Systems NOLCOS 2016

  7. [7]

    Boyuan Chen, Kuang Huang, Sunand Raghupathi, Ishaan Chandratreya, Qiang Du, and Hod Lipson. 2022. Automated discovery of fundamental variables hidden in experimental data.Nature Computational Science2, 7 (2022), 433–442

  8. [8]

    Ricky TQ Chen, Xuechen Li, Roger B Grosse, and David K Duvenaud. 2018. Isolating sources of disentanglement in variational autoencoders.Advances in neural information processing systems31 (2018)

  9. [9]

    Henggang Cui, Thi Nguyen, Fang-Chieh Chou, Tsung-Han Lin, Jeff Schneider, David Bradley, and Nemanja Djuric. 2020. Deep kinematic models for kinemati- cally feasible vehicle trajectory predictions. In2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 10563–10569

  10. [10]

    Filipe de Avila Belbute-Peres, Kevin Smith, Kelsey Allen, Josh Tenenbaum, and J Zico Kolter. 2018. End-to-end differentiable physics for learning and control. Advances in neural information processing systems31 (2018)

  11. [11]

    Franck Djeumou, Cyrus Neary, and Ufuk Topcu. 2023. How to learn and gen- eralize from three minutes of data: Physics-constrained and uncertainty-aware neural stochastic differential equations.arXiv preprint arXiv:2306.06335(2023)

  12. [12]

    Stefano Ferraro, Pietro Mazzaglia, Tim Verbelen, and Bart Dhoedt. 2023. Fo- cus: Object-centric world models for robotics manipulation.arXiv preprint arXiv:2307.02427(2023)

  13. [13]

    David Fridovich-Keil, Andrea Bajcsy, Jaime F Fisac, Sylvia L Herbert, Steven Wang, Anca D Dragan, and Claire J Tomlin. 2020. Confidence-aware motion prediction for real-time collision avoidance1.The International Journal of Robotics Research39, 2-3 (2020), 250–265

  14. [14]

    Xiang Fu, Ge Yang, Pulkit Agrawal, and Tommi Jaakkola. 2021. Learning task informed abstractions. InInternational Conference on Machine Learning. PMLR, 3480–3491

  15. [15]

    David Ha and Jürgen Schmidhuber. 2018. World models.arXiv preprint arXiv:1803.10122(2018)

  16. [16]

    Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, and Jimmy Ba. 2020. Mastering atari with discrete world models.arXiv preprint arXiv:2010.02193 (2020)

  17. [17]

    Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. 2023. Mas- tering diverse domains through world models.arXiv preprint arXiv:2301.04104 (2023)

  18. [18]

    Osman Hasan and Sofiene Tahar. 2015. Formal verification methods. InEncyclo- pedia of Information Science and Technology, Third Edition. IGI global, 7162–7170

  19. [19]

    Irina Higgins, Loic Matthey, Arka Pal, Christopher P Burgess, Xavier Glorot, Matthew M Botvinick, Shakir Mohamed, and Alexander Lerchner. 2017. beta-vae: Learning basic visual concepts with a constrained variational framework.ICLR (Poster)3 (2017)

  20. [20]

    Kai-Chieh Hsu, Karen Leung, Yuxiao Chen, Jaime F Fisac, and Marco Pavone. 2023. Interpretable Trajectory Prediction for Autonomous Vehicles via Counterfactual Responsibility. In2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 5918–5925

  21. [21]

    Ahmed Hussein, Mohamed Medhat Gaber, Eyad Elyan, and Chrisina Jayne. 2017. Imitation learning: A survey of learning methods.ACM Computing Surveys (CSUR)50, 2 (2017), 1–35

  22. [22]

    Masha Itkina and Mykel Kochenderfer. 2023. Interpretable self-aware neural networks for robust trajectory prediction. InConference on Robot Learning. PMLR, 606–617

  23. [23]

    Leslie Pack Kaelbling, Michael L Littman, and Andrew W Moore. 1996. Rein- forcement learning: A survey.Journal of artificial intelligence research4 (1996), 237–285

  24. [24]

    Maximilian Karl, Maximilian Soelch, Justin Bayer, and Patrick Van der Smagt

  25. [25]

    Deep variational bayes filters: Unsupervised learning of state space models from raw data.arXiv preprint arXiv:1605.06432(2016)

  26. [26]

    Sydney M Katz, Anthony L Corso, Christopher A Strong, and Mykel J Kochender- fer. 2022. Verification of image-based neural network controllers using generative models.Journal of Aerospace Information Systems19, 9 (2022), 574–584

  27. [27]

    Hyunjik Kim and Andriy Mnih. 2018. Disentangling by factorising. InInterna- tional conference on machine learning. PMLR, 2649–2658

  28. [28]

    Diederik P Kingma. 2013. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114(2013)

  29. [29]

    George Konidaris, Leslie Pack Kaelbling, and Tomas Lozano-Perez. 2018. From skills to symbols: Learning symbolic representations for abstract high-level planning.Journal of Artificial Intelligence Research61 (2018), 215–289

  30. [30]

    Long Le, Ryan Lucas, Chen Wang, Chuhao Chen, Dinesh Jayaraman, Eric Eaton, and Lingjie Liu. 2025. Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels.arXiv preprint arXiv:2508.17437(2025)

  31. [31]

    Anson Lei, Bernhard Schölkopf, and Ingmar Posner. 2024. SPARTAN: A sparse transformer learning local causation.arXiv preprint arXiv:2411.06890(2024)

  32. [32]

    Anjian Li, Liting Sun, Wei Zhan, Masayoshi Tomizuka, and Mo Chen. 2021. Prediction-based reachability for collision avoidance in autonomous driving. In 2021 Intl. Conf. on Robotics and Automation (ICRA)

  33. [33]

    Yichao Liang, Nishanth Kumar, Hao Tang, Adrian Weller, Joshua B Tenenbaum, Tom Silver, João F Henriques, and Kevin Ellis. 2024. Visualpredicator: Learning abstract world models with neuro-symbolic predicates for robot planning.arXiv preprint arXiv:2410.23156(2024)

  34. [34]

    TP Lillicrap. 2015. Continuous control with deep reinforcement learning.arXiv preprint arXiv:1509.02971(2015)

  35. [35]

    Lars Lindemann, Xin Qin, Jyotirmoy V Deshmukh, and George J Pappas. 2023. Conformal prediction for stl runtime verification. InProceedings of the ACM/IEEE 14th International Conference on Cyber-Physical Systems (with CPS-IoT Week 2023). 142–153

  36. [36]

    Ori Linial, Neta Ravid, Danny Eytan, and Uri Shalit. 2021. Generative ODE modeling with known unknowns. InProceedings of the Conference on Health, Inference, and Learning (CHIL ’21). Association for Computing Machinery, New York, NY, USA, 79–94. doi:10.1145/3450439.3451866

  37. [37]

    Juanwu Lu, Wei Zhan, Masayoshi Tomizuka, and Yeping Hu. 2024. Towards Generalizable and Interpretable Motion Prediction: A Deep Variational Bayes Approach. InInternational Conference on Artificial Intelligence and Statistics. PMLR, 4717–4725

  38. [38]

    Yanbing Mao, Yuliang Gu, Lui Sha, Huajie Shao, Qixin Wang, and Tarek Ab- delzaher. 2025. Phy-Taylor: Partially Physics-Knowledge-Enhanced Deep Neural Networks via NN Editing.IEEE Transactions on Neural Networks and Learning Systems36, 1 (Jan. 2025), 447–461. doi:10.1109/TNNLS.2023.3325432

  39. [39]

    Zhenjiang Mao, Siqi Dai, Yuang Geng, and Ivan Ruchkin. 2024. Zero-shot Safety Prediction for Autonomous Robots with Foundation World Models.arXiv preprint arXiv:2404.00462(2024)

  40. [40]

    Vincent Micheli, Eloi Alonso, and François Fleuret. 2022. Transformers are sample-efficient world models.arXiv preprint arXiv:2209.00588(2022)

  41. [41]

    Chen Min, Dawei Zhao, Liang Xiao, Yiming Nie, and Bin Dai. 2023. Uniworld: Au- tonomous driving pre-training via world models.arXiv preprint arXiv:2308.07234 (2023)

  42. [42]

    Malte Mosbach, Jan Niklas Ewertz, Angel Villar-Corrales, and Sven Behnke. 2025. SOLD: Slot Object-Centric Latent Dynamics Models for Relational Manipulation Learning from Pixels. arXiv:2410.08822 [cs.LG] https://arxiv.org/abs/2410.08822

  43. [43]

    Kensuke Nakamura and Somil Bansal. 2023. Online update of safety assurances using confidence-based predictions. In2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 12765–12771

  44. [44]

    Andi Peng, Mycal Tucker, Eoin Kenny, Noga Zaslavsky, Pulkit Agrawal, and Julie A Shah. 2024. Human-guided complexity-controlled abstractions.Advances in Neural Information Processing Systems36 (2024)

  45. [45]

    Jordan Peper, Zhenjiang Mao, Yuang Geng, Siyuan Pan, and Ivan Ruchkin

  46. [46]

    Four Principles for Physically Interpretable World Models.arXiv preprint arXiv:2503.02143(2025)

  47. [47]

    Ali Razavi, Aaron Van den Oord, and Oriol Vinyals. 2019. Generating diverse high-fidelity images with vq-vae-2.Advances in neural information processing systems32 (2019)

  48. [48]

    Ivan Ruchkin, Matthew Cleaveland, Radoslav Ivanov, Pengyuan Lu, Taylor Car- penter, Oleg Sokolsky, and Insup Lee. 2022. Confidence composition for monitors of verification assumptions. In2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS). IEEE, 1–12

  49. [49]

    Tim Salzmann, Boris Ivanovic, Punarjay Chakravarty, and Marco Pavone. 2020. Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data. InComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16. Springer, 683–700

  50. [50]

    Dylan Sam, Rattana Pukdee, Daniel P Jeong, Yewon Byun, and J Zico Kolter

  51. [51]

    Bayesian neural networks with domain knowledge priors.arXiv preprint arXiv:2402.13410(2024)

  52. [52]

    Younggyo Seo, Danijar Hafner, Hao Liu, Fangchen Liu, Stephen James, Kimin Lee, and Pieter Abbeel. 2023. Masked world models for visual control. InConference on Robot Learning. PMLR, 1332–1344. Mao et al

  53. [53]

    Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting.Advances in neural information processing systems28 (2015)

  54. [54]

    Kaustubh Sridhar, Souradeep Dutta, James Weimer, and Insup Lee. 2023. Guaran- teed conformance of neurosymbolic models to natural constraints. InLearning for Dynamics and Control Conference. PMLR, 76–89

  55. [55]

    Renukanandan Tumu, Lars Lindemann, Truong Nghiem, and Rahul Mangharam

  56. [56]

    In 2023 IEEE Intelligent Vehicles Symposium (IV)

    Physics constrained motion prediction with uncertainty quantification. In 2023 IEEE Intelligent Vehicles Symposium (IV). IEEE, 1–8

  57. [57]

    Aaron Van Den Oord, Oriol Vinyals, et al. 2017. Neural discrete representation learning.Advances in neural information processing systems30 (2017)

  58. [58]

    Ari Viitala, Rinu Boney, Yi Zhao, Alexander Ilin, and Juho Kannala. 2021. Learning to drive (L2D) as a low-cost benchmark for real-world reinforcement learning. In 2021 20th International Conference on Advanced Robotics (ICAR). IEEE, 275–281

  59. [59]

    Masaki Waga, Ezequiel Castellano, Sasinee Pruekprasert, Stefan Klikovits, Toru Takisaka, and Ichiro Hasuo. 2022. Dynamic shielding for reinforcement learning in black-box environments. InInternational Symposium on Automated Technology for Verification and Analysis. Springer, 25–41

  60. [60]

    Chuan Wen, Xingyu Lin, John So, Kai Chen, Qi Dou, Yang Gao, and Pieter Abbeel. 2023. Any-point trajectory modeling for policy learning.arXiv preprint arXiv:2401.00025(2023)

  61. [61]

    Philipp Wu, Alejandro Escontrela, Danijar Hafner, Pieter Abbeel, and Ken Gold- berg. 2023. Daydreamer: World models for physical robot learning. InConference on robot learning. PMLR, 2226–2240

  62. [62]

    Yifan Xue, Michael Ding, and Xinghua Lu. 2019. Supervised vector quantized variational autoencoder for learning interpretable global representations.arXiv preprint arXiv:1909.11124(2019)

  63. [63]

    Ziyang Yan, Wenzhen Dong, Yihua Shao, Yuhang Lu, Liu Haiyang, Jingwen Liu, Haozhe Wang, Zhe Wang, Yan Wang, Fabio Remondino, et al. 2024. Renderworld: World model with self-supervised 3d label.arXiv preprint arXiv:2409.11356(2024)

  64. [64]

    Mengyue Yang, Furui Liu, Zhitang Chen, Xinwei Shen, Jianye Hao, and Jun Wang

  65. [65]

    InProceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Causalvae: Disentangled representation learning via neural structural causal models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9593–9602

  66. [66]

    Dingling Yao, Caroline Muller, and Francesco Locatello. 2024. Marrying causal representation learning with dynamical systems for science.Advances in Neural Information Processing Systems37 (2024), 71705–71736

  67. [67]

    Zhe Zhao, Mengshi Qi, and Huadong Ma. 2024. Decomposed Vector-Quantized Variational Autoencoder for Human Grasp Generation. InEuropean Conference on Computer Vision. Springer, 447–463

  68. [68]

    Wenzhao Zheng, Weiliang Chen, Yuanhui Huang, Borui Zhang, Yueqi Duan, and Jiwen Lu. 2024. Occworld: Learning a 3d occupancy world model for autonomous driving. InEuropean conference on computer vision. Springer, 55–72

  69. [69]

    Weiheng Zhong and Hadi Meidani. 2023. PI-VAE: Physics-Informed Variational Auto-Encoder for stochastic differential equations.Computer Methods in Applied Mechanics and Engineering403 (2023), 115664. doi:10.1016/j.cma.2022.115664

  70. [70]

    Sicheng Zuo, Wenzhao Zheng, Yuanhui Huang, Jie Zhou, and Jiwen Lu. 2024. Gaussianworld: Gaussian world model for streaming 3d occupancy prediction. arXiv preprint arXiv:2412.10373(2024). Physically Interpretable World Models via Weakly Supervised Representation Learning Appendix Model Architecture and Training Details.This appendix pro- vides detailed imp...

  71. [71]

    The dynamics model learns between 4 and 10 environment- specific parameters using 50 weak supervision samples per time step

    Intrinsic variants instead embed the physical portion directly into 𝑧∗ 𝑡 . The dynamics model learns between 4 and 10 environment- specific parameters using 50 weak supervision samples per time step. I. Training Hyperparameters in Text.All experiments use batch size 32, cosine learning-rate decay with 5 warmup epochs, a maxi- mum of 200 epochs with patien...