pith. machine review for the scientific record. sign in

arxiv: 2605.14398 · v1 · submitted 2026-05-14 · 💻 cs.AI

Recognition: 2 theorem links

· Lean Theorem

Coding Agent Is Good As World Simulator

Authors on Pith no claims yet

Pith reviewed 2026-05-15 02:15 UTC · model grok-4.3

classification 💻 cs.AI
keywords agentcodesimulationmodelsphysicalvisualframeworkworld
0
0 comments X

The pith

A multi-agent framework generates and refines executable physics simulation code from prompts to create world models that enforce physical constraints, claiming superior accuracy and fidelity over video-based alternatives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

World models help AI systems predict and interact with environments. Most recent ones generate future video frames from past ones, but these often produce unrealistic results such as objects passing through each other or unstable motion because they learn statistical patterns without explicit physical rules. The paper instead uses software agents to write actual simulation code. A planning agent converts a text prompt into a structured scene description. A code agent implements it using a physics engine. A visual review agent examines the rendered output and a physics analysis agent checks consistency with rules like gravity, contacts, and momentum. The code is revised based on their feedback until the simulation matches the prompt and physical constraints. The resulting code can be executed step by step, allowing interactive use. The abstract reports that this code-based method achieves better physical accuracy, instruction following, and visual quality than advanced video models, with potential uses in driving simulation and robot training where correct physics matters for safe learning.

Core claim

Experimental results show that our framework outperforms advanced video-based models in physical accuracy, instruction fidelity and visual quality, which could be applied to various scenarios including driving simulation and embodied robot tasks.

Load-bearing premise

The assumption that the visual review and physics analysis agents can reliably detect and guide corrections for physical inconsistencies in generated code without ground-truth physics data or human intervention, allowing the iterative process to converge to valid simulations.

Figures

Figures reproduced from arXiv: 2605.14398 by Bocheng Zou, Dan Negrut, Hongyu Wang, Jingquan Wang, Radu Serban.

Figure 1
Figure 1. Figure 1: multi-agent pipeline. matches the confirmed plan. If it does, the system proceeds to the next stage. Otherwise, the validator returns a structured error report, and the code agent patches the current program. The loop ends until the final program satisfies all steps in the plan. 2.3 Plan Agent The plan agent converts an underspecified user request into a simulator-oriented plan before code generation, whic… view at source ↗
Figure 2
Figure 2. Figure 2: Robot in office scene. 25 [PITH_FULL_IMAGE:figures/full_fig_p025_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Vehicle in outdoor scene. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Vehicle through FSI ground. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_4.png] view at source ↗
read the original abstract

World models have emerged as a powerful paradigm for building interactive simulation environments, with recent video-based approaches demonstrating impressive progress in generating visually plausible dynamics. However, because these models typically infer dynamics from video and represent them in latent states, they do not explicitly enforce physical constraints. As a result, the generated video rollouts are not physically plausible, exhibiting unstable contacts, distorted shapes, or inconsistent motion. In this paper, we present an agentic framework constructing physics-based world models through executable simulation code. The framework coordinates planning, code generation, visual review, and physics analysis agents. The planning agent converts the natural language prompt into a structured scene plan, the code agent implements it as executable simulation code, and the visual review agent provide visual feedback while the physics analysis agent checks physical consistency. The code is iteratively revised based on the feedback until the simulation matches the prompt reqirements and physical constraints. Experimental results show that our framework outperforms advanced video-based models in physical accuracy, instruction fidelity and visual quality, which could be applied to various scenarios including driving simulation and embodied robot tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents an agentic framework for constructing physics-based world models via executable simulation code. A planning agent converts natural-language prompts into scene plans, a code agent generates simulation code, and visual-review plus physics-analysis agents provide iterative feedback to revise the code until it satisfies both prompt requirements and physical constraints. The central claim is that this code-based approach outperforms advanced video-based world models in physical accuracy, instruction fidelity, and visual quality, with applications to driving simulation and embodied robotics.

Significance. If the empirical results hold, the work would offer a concrete alternative to latent video dynamics by enforcing explicit, executable physics, which could improve controllability and long-horizon consistency in interactive simulators.

major comments (2)
  1. [Abstract] Abstract: the assertion that the framework 'outperforms advanced video-based models in physical accuracy, instruction fidelity and visual quality' is presented without any metrics, baselines, dataset descriptions, or experimental methodology. This absence leaves the central empirical claim unsupported in the provided text.
  2. [Abstract] Framework description (Abstract): the physics analysis agent is described as checking 'physical consistency' and driving code revisions solely via visual feedback and its own reasoning, with no reference to ground-truth trajectories, an external physics engine, or formal verification. Without such grounding it is unclear how the agent can reliably detect or correct violations such as unstable contacts or inconsistent motion, undermining attribution of any reported superiority to enforced physics rather than agent self-consistency.
minor comments (1)
  1. [Abstract] Abstract: 'reqirements' is a typo and should read 'requirements'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comments point by point below. We agree that the abstract requires strengthening to better support the empirical claims and will revise it accordingly while preserving the core contribution of the agentic code-based framework.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertion that the framework 'outperforms advanced video-based models in physical accuracy, instruction fidelity and visual quality' is presented without any metrics, baselines, dataset descriptions, or experimental methodology. This absence leaves the central empirical claim unsupported in the provided text.

    Authors: We agree that the abstract, due to length constraints, omits specific metrics, baselines, and methodology details. The full manuscript (Section 4) provides these: comparisons against video models (e.g., specific baselines like those in recent works on video dynamics) using quantitative metrics such as physical violation counts, instruction adherence scores, and visual quality assessments on datasets including driving and robotics scenarios. In the revision, we will expand the abstract with a concise statement of key results (e.g., 'outperforms baselines by X% in physical accuracy on Y benchmark') to make the claim self-contained while directing readers to the experiments. revision: yes

  2. Referee: [Abstract] Framework description (Abstract): the physics analysis agent is described as checking 'physical consistency' and driving code revisions solely via visual feedback and its own reasoning, with no reference to ground-truth trajectories, an external physics engine, or formal verification. Without such grounding it is unclear how the agent can reliably detect or correct violations such as unstable contacts or inconsistent motion, undermining attribution of any reported superiority to enforced physics rather than agent self-consistency.

    Authors: The physics analysis agent detects violations by rendering simulation outputs and applying its pre-trained knowledge of physical laws to analyze motion, contacts, and stability directly from the visual feedback and generated code. This LLM-driven reasoning identifies issues like unstable contacts or inconsistent trajectories without external engines, allowing iterative code revisions to enforce constraints explicitly in the executable simulation. The superiority stems from the final code being physically grounded and runnable, unlike latent video models. We will revise the abstract and add a methods subsection with prompt examples and case studies of detected/corrected violations to clarify the process. revision: partial

Circularity Check

0 steps flagged

No circularity: agentic framework is an external iterative loop with no self-definitional reductions or fitted predictions

full rationale

The paper presents a multi-agent system (planning, code generation, visual review, physics analysis) that generates and refines executable simulation code via feedback until prompt and constraint satisfaction. No equations, parameters, or derivations appear that reduce to their own inputs by construction. Claims of outperformance rest on external experimental comparisons to video models rather than self-referential metrics or self-citations. The physics analysis step operates on visual and reasoning feedback without being defined in terms of the final output. This is a standard engineering description of an iterative pipeline and remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that specialized agents can iteratively produce and verify correct physics code from prompts; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Specialized agents can generate, review, and iteratively refine executable simulation code to satisfy both natural language prompts and physical constraints
    The framework description assumes this capability without detailing mechanisms or providing evidence for reliable convergence.

pith-pipeline@v0.9.0 · 5489 in / 1333 out tokens · 50960 ms · 2026-05-15T02:15:46.574907+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · 9 internal anchors

  1. [1]

    Recurrent world models facilitate policy evolution

    David Ha and Jürgen Schmidhuber. Recurrent world models facilitate policy evolution. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Gar- nett, editors,Advances in Neural Information Processing Systems, volume 31. Curran Asso- ciates, Inc., 2018. URL https://proceedings.neurips.cc/paper_files/paper/2018/file/ 2de5d16682c3c3500...

  2. [2]

    Learning latent dynamics for planning from pixels

    Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. InInternational conference on machine learning, pages 2555–2565. PMLR, 2019

  3. [3]

    Dream to control: Learning behaviors by latent imagination

    Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination. InInternational Conference on Learning Representations, 2020

  4. [4]

    Genie: Generative interactive environments

    Jake Bruce, Michael D Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, et al. Genie: Generative interactive environments. InForty-first International Conference on Machine Learning, 2024

  5. [5]

    GAIA-1: A Generative World Model for Autonomous Driving

    Anthony Hu, Lloyd Russell, Hudson Yeo, Zak Murez, George Fedoseev, Alex Kendall, Jamie Shotton, and Gianluca Corrado. Gaia-1: A generative world model for autonomous driving.arXiv preprint arXiv:2309.17080, 2023

  6. [6]

    Video gen- eration models as world simulators

    Tim Brooks, Bill Peebles, Connor Holmes, Will DePue, Yufei Guo, Li Jing, David Schnurr, Joe Taylor, Troy Luhman, Eric Luhman, Clarence Ng, Ricky Wang, and Aditya Ramesh. Video gen- eration models as world simulators. OpenAI, 2024. URL https://openai.com/research/ video-generation-models-as-world-simulators

  7. [7]

    Cosmos World Foundation Model Platform for Physical AI

    Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, et al. Cosmos world foundation model platform for physical ai.arXiv preprint arXiv:2501.03575, 2025

  8. [8]

    World Simulation with Video Foundation Models for Physical AI

    Arslan Ali, Junjie Bai, Maciej Bala, Yogesh Balaji, Aaron Blakeman, Tiffany Cai, Jiaxin Cao, Tianshi Cao, Elizabeth Cha, Yu-Wei Chao, et al. World simulation with video foundation models for physical ai.arXiv preprint arXiv:2511.00062, 2025

  9. [9]

    Simulating the visual world with artificial intelligence: A roadmap.arXiv preprint arXiv:2511.08585, 2025

    Jingtong Yue, Ziqi Huang, Zhaoxi Chen, Xintao Wang, Pengfei Wan, and Ziwei Liu. Simulating the visual world with artificial intelligence: A roadmap.arXiv preprint arXiv:2511.08585, 2025

  10. [10]

    Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020

    Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, et al. Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020

  11. [11]

    Mastering Diverse Domains through World Models

    Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models.arXiv preprint arXiv:2301.04104, 2023

  12. [12]

    V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

    Mido Assran, Adrien Bardes, David Fan, Quentin Garrido, Russell Howes, Matthew Muckley, Ammar Rizvi, Claire Roberts, Koustuv Sinha, Artem Zholus, et al. V-jepa 2: Self-supervised video models enable understanding, prediction and planning.arXiv preprint arXiv:2506.09985, 2025. 10

  13. [13]

    Irasim: Learning interactive real-robot action simulators.arXiv preprint arXiv:2406.14540, 2024

    Fangqi Zhu, Hongtao Wu, Song Guo, Yuxiao Liu, Chilam Cheang, and Tao Kong. Irasim: Learning interactive real-robot action simulators.arXiv preprint arXiv:2406.14540, 2024

  14. [14]

    Robot learning from a physical world model.arXiv preprint arXiv:2511.07416, 2025

    Jiageng Mao, Sicheng He, Hao-Ning Wu, Yang You, Shuyang Sun, Zhicheng Wang, Yanan Bao, Huizong Chen, Leonidas Guibas, Vitor Guizilini, et al. Robot learning from a physical world model.arXiv preprint arXiv:2511.07416, 2025

  15. [15]

    Physworld: From real videos to world models of deformable objects via physics-aware demonstration synthesis.arXiv preprint arXiv:2510.21447, 2025

    Yu Yang, Zhilu Zhang, Xiang Zhang, Yihan Zeng, Hui Li, and Wangmeng Zuo. Physworld: From real videos to world models of deformable objects via physics-aware demonstration synthesis.arXiv preprint arXiv:2510.21447, 2025

  16. [16]

    Embodied ai agents: Modeling the world.arXiv preprint arXiv:2506.22355, 2025

    Pascale Fung, Yoram Bachrach, Asli Celikyilmaz, Kamalika Chaudhuri, Delong Chen, Willy Chung, Emmanuel Dupoux, Hongyu Gong, Hervé Jégou, Alessandro Lazaric, et al. Embodied ai agents: Modeling the world.arXiv preprint arXiv:2506.22355, 2025

  17. [17]

    Mujoco: A physics engine for model-based control

    Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pages 5026–5033. IEEE, 2012

  18. [18]

    Project Chrono - An Open-Source Physics Engine

    Project Chrono. Project Chrono - An Open-Source Physics Engine. https://projectchrono.org/, 2026

  19. [19]

    Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning

    Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, et al. Isaac gym: High performance gpu-based physics simulation for robot learning.arXiv preprint arXiv:2108.10470, 2021

  20. [20]

    Sapien: A simulated part-based interactive environment

    Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, He Wang, et al. Sapien: A simulated part-based interactive environment. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11097–11107, 2020

  21. [21]

    Habitat 2.0: Training home assistants to rearrange their habitat

    Andrew Szot, Alexander Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Singh Chaplot, Oleksandr Maksymets, et al. Habitat 2.0: Training home assistants to rearrange their habitat. InAdvances in Neural Information Processing Systems, volume 34, pages 251–266, 2021

  22. [22]

    igibson 2.0: Object-centric simulation for robot learning of everyday household tasks

    Chengshu Li, Fei Xia, Roberto Martin-Martin, Michael Lingelbach, Sanjana Srivastava, Bokui Shen, Kent Elliott Vainio, Cem Gokmen, Gokul Dharan, Tanish Jain, et al. igibson 2.0: Object-centric simulation for robot learning of everyday household tasks. InConference on Robot Learning, pages 455–465. PMLR, 2022

  23. [23]

    Procthor: Large-scale embodied ai using procedural generation.Advances in Neural Information Processing Systems, 35:5982–5994, 2022

    Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Kiana Ehsani, Jordi Salvador, Winson Han, Eric Kolve, Aniruddha Kembhavi, and Roozbeh Mottaghi. Procthor: Large-scale embodied ai using procedural generation.Advances in Neural Information Processing Systems, 35:5982–5994, 2022

  24. [24]

    Atiss: Autoregressive transformers for indoor scene synthesis

    Despoina Paschalidou, Amlan Kar, Maria Shugrina, Karsten Kreis, Andreas Geiger, and Sanja Fidler. Atiss: Autoregressive transformers for indoor scene synthesis. InAdvances in Neural Information Processing Systems, volume 34, pages 12013–12026, 2021

  25. [25]

    Commonscenes: Generating commonsense 3d indoor scenes with scene graph diffusion

    Zifan Zhuang, Yian Wang, Xiaowen Qiu, Wojciech Matusik, Joshua B Tenenbaum, and Chuang Gan. Commonscenes: Generating commonsense 3d indoor scenes with scene graph diffusion. InAdvances in Neural Information Processing Systems, volume 36, 2023

  26. [26]

    Holodeck: Language guided generation of 3d embodied ai environments

    Yue Yang, Fan Zhao, Yihan Zhu, Peiyuan Zhang, Xiaofeng Chen, and Siyuan Huang. Holodeck: Language guided generation of 3d embodied ai environments. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

  27. [27]

    Physcene: Physically interactable 3d scene synthesis for embodied ai

    Yandan Yang, Baoxiong Jia, Peiyuan Zhi, and Siyuan Huang. Physcene: Physically interactable 3d scene synthesis for embodied ai. InProceedings of Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  28. [28]

    Physcensis: Physics-augmented llm agents for complex physical scene arrangement

    Yian Wang, Han Yang, Minghao Guo, Xiaowen Qiu, Tsun-Hsuan Wang, Wojciech Matusik, Joshua B Tenenbaum, and Chuang Gan. Physcensis: Physics-augmented llm agents for complex physical scene arrangement. InThe Fourteenth International Conference on Learning Representations, 2026

  29. [29]

    Re- Act: Synergizing reasoning and acting in language models

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. Re- Act: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR), 2023

  30. [30]

    Reflexion: Language agents with verbal reinforcement learning

    Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning. InAdvances in Neural Information Processing Systems, volume 36, 2023. 11

  31. [31]

    Self-refine: Iterative refinement with self-feedback

    Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al. Self-refine: Iterative refinement with self-feedback. Advances in neural information processing systems, 36:46534–46594, 2023

  32. [32]

    Executable code actions elicit better llm agents

    Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, and Heng Ji. Executable code actions elicit better llm agents. InForty-first International Conference on Machine Learning, 2024

  33. [33]

    SWE-agent: Agent-computer interfaces enable automated software engineering

    John Yang, Carlos E Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik R Narasimhan, and Ofir Press. SWE-agent: Agent-computer interfaces enable automated software engineering. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https: //openreview.net/forum?id=mXpq6ut8J3

  34. [34]

    Autocoderover: Autonomous program improvement.arXiv preprint arXiv:2404.05427, 2024

    Yuntong Zhang, Haifeng Ruan, Zhiyu Fan, and Abhik Roychoudhury. Autocoderover: Autonomous program improvement.arXiv preprint arXiv:2404.05427, 2024

  35. [35]

    Simbench: A framework for evaluating and diagnosing llm-based digital-twin generation for multi-physics simulation.IEEE Access, 14:61784–61808, 2026

    Jingquan Wang, Andrew Negrut, Hongyu Wang, Harry Zhang, and Dan Negrut. Simbench: A framework for evaluating and diagnosing llm-based digital-twin generation for multi-physics simulation.IEEE Access, 14:61784–61808, 2026. doi: 10.1109/ACCESS.2026.3685519

  36. [36]

    Chronollm: customizing language models for physics-based simulation code generation

    Jingquan Wang, Andrew Negrut, Harry Zhang, Khailanii Slaton, Shu Wang, Radu Serban, Jinlong Wu, and Dan Negrut. Chronollm: customizing language models for physics-based simulation code generation. Multibody System Dynamics, pages 1–45, 2026

  37. [37]

    CAMEL: Communicative agents for ”mind” exploration of large language model society

    Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. CAMEL: Communicative agents for ”mind” exploration of large language model society. InThirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum? id=3IyL2XWDkG

  38. [38]

    AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

    Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. Autogen: Enabling next-gen llm applications via multi-agent conversation. arXiv preprint arXiv:2308.08155, 2023

  39. [39]

    MetaGPT: Meta programming for a multi-agent collaborative framework

    Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and Jürgen Schmidhuber. MetaGPT: Meta programming for a multi-agent collaborative framework. In The Twelfth International Conference on Learning Representations, 20...

  40. [40]

    Multi-Agent Collaboration Mechanisms: A Survey of LLMs

    Khanh-Tung Tran, Dung Dao, Minh-Duong Nguyen, Quoc-Viet Pham, Barry O’Sullivan, and Hoang D Nguyen. Multi-agent collaboration mechanisms: A survey of llms.arXiv preprint arXiv:2501.06322, 2025

  41. [41]

    A self-correcting multi-agent llm framework for language-based physics simulation and explanation.npj Artificial Intelligence, 2(1):10, 2026

    Donggeun Park, Hyeonbin Moon, and Seunghwa Ryu. A self-correcting multi-agent llm framework for language-based physics simulation and explanation.npj Artificial Intelligence, 2(1):10, 2026

  42. [42]

    Creation, evaluation and self- validation of simulation models with large language models.Neurocomputing, page 132030, 2025

    Tobias Möltner, Peter Manzl, Michael Pieber, and Johannes Gerstmayr. Creation, evaluation and self- validation of simulation models with large language models.Neurocomputing, page 132030, 2025

  43. [43]

    Sketchfab.https://sketchfab.com, 2012

    Sketchfab Development Team. Sketchfab.https://sketchfab.com, 2012

  44. [44]

    Approximate convex decomposition for 3d meshes with collision-aware concavity and tree search.ACM Transactions on Graphics (TOG), 41(4):1–18, 2022

    Xinyue Wei, Minghua Liu, Zhan Ling, and Hao Su. Approximate convex decomposition for 3d meshes with collision-aware concavity and tree search.ACM Transactions on Graphics (TOG), 41(4):1–18, 2022

  45. [45]

    Gonzalez, Ion Stoica, Song Han, and Yao Lu

    Dacheng Li, Yunhao Fang, Yukang Chen, Shuo Yang, Shiyi Cao, Justin Wong, Michael Luo, Xiaolong Wang, Hongxu Yin, Joseph E. Gonzalez, Ion Stoica, Song Han, and Yao Lu. Worldmodelbench: Judging video generation models as world models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2026. URL https...

  46. [46]

    Wan: Open and Advanced Large-Scale Video Generative Models

    Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, et al. Wan: Open and advanced large-scale video generative models.arXiv preprint arXiv:2503.20314, 2025

  47. [47]

    -z"‘ -> ‘camera_up=[0,0,1]‘ - ‘gravity_axis=

    Blender Foundation. Blender, 2024. URLhttps://www.blender.org. Version 4.0. 12 A Appendix A.1 The Use of Large Language Models In the preparation of this manuscript, the LLM was used for tasks such as grammar correction, sentence restructuring, and improving the overall readability of the manuscript. The LLM also assisted with code debuging and optimizati...

  48. [48]

    - Procedural ‘size: [sx, sy, sz]‘ is full width/depth/height, never half-extents

    Resolve every object’s full extents first. - Procedural ‘size: [sx, sy, sz]‘ is full width/depth/height, never half-extents. - For generated boxes, tanks, platforms, floating plates, and ramps, bbox values are derived directly from ‘size‘. - If a size is not specified and cannot be derived from an enclosing object or vehicle wheelbase, add ‘clarifications...

  49. [49]

    tank starts at the origin

    Pick one coordinate convention for the enclosing scene object and keep it. - Regular box containers use the normal center convention. - ‘generated_boundary‘ containers/tanks/channels are special: center in ‘x/y‘, floor at ‘position.z‘, rim at ‘position.z + size.z‘. A 4 x 2 x 1 m generated-boundary tank with floor at z=0 therefore has ‘position=(0, 0, 0)‘,...

  50. [50]

    Apply predicates to anchors (‘min_x‘, ‘max_x‘, ‘center_x‘, ‘bottom_z‘, ‘top_z‘) and recompute ‘position‘ from the final bbox

  51. [51]

    -x": A.max_x = B.min_x - distance side =

    Serialize the final numeric center in ‘position‘ for rigid/procedural bodies. For SPH fluid rows with ‘FREE-SURFACE-AT‘, the row’s ‘position.z‘ may denote the free-surface marker; codegen derives the sampler center separately from the free-surface height. ### Common coordinate derivations Container centered at the scene origin: ‘‘‘text B.center_x = 0 B.ce...

  52. [52]

    - Declare ‘topology.reference_heights‘ for shared z-layers

    Set scene invariants. - Declare ‘topology.reference_heights‘ for shared z-layers

  53. [53]

    - Fluid domain: ‘FREE-SURFACE-AT‘

    Pick predicate families per body. - Fluid domain: ‘FREE-SURFACE-AT‘. - Fluid container: ‘CONTAINS-FLUID‘ + resolved z + optional in-plane predicates. - Buoyant body: ‘FLOATS-AT-SURFACE‘ + height + in-plane predicates. - Flank along an axis: cardinal ‘FRONT-OF‘ / ‘BACK-OF‘ / ‘LEFT-OF‘ / ‘RIGHT-OF‘ + height + transverse alignment. - Bridge/beam spanning fla...

  54. [54]

    - Size declarations first (‘HEIGHT‘ and known ‘size‘), then z/support anchors, then in-plane placement, then orientation

    Emit predicates in dependency order. - Size declarations first (‘HEIGHT‘ and known ‘size‘), then z/support anchors, then in-plane placement, then orientation. - Keep all predicates for the same subject contiguous. - A referenced object must already be fully placed, except ‘"root"‘

  55. [55]

    subject":

    Self-check the resolved state. - Every asset and scene object appears in ‘objects[]‘; child objects carry a ‘topology.relation‘ or an equivalent resolved ‘pose‘. - Every row has concrete numeric ‘position.x/y/z‘ and ‘orientation.deg_z‘. - Spanning bodies lie between their flanks (check via ‘position.x‘ / ‘position.y‘ matching the midpoint of the flank cen...

  56. [56]

    vehicle starts on platform

    Read the user’s intent ("vehicle starts on platform" / "platforms beside the tank with tops level" / "plate floats on water")

  57. [57]

    Map intent→kind from the table above

  58. [58]

    Within that kind, pick the named variant that matches the geometric detail (which face? which Z alignment? which submersion depth?)

  59. [59]

    convex\" for static, \

    The formula for the chosen variant is in the corresponding subsection below. 22 ### Stacking / on-top patterns | ‘relation‘ | Formula | |---|---| | ‘spawned_on_top‘ | obj sits centered on top of ref. ‘obj.x = ref.x‘, ‘obj.y = ref.y‘, ‘obj.z = ref.z + ref.size.z/2 + obj.size.z/2‘ | | ‘placed_on_top‘ | alias of ‘spawned_on_top‘ | | ‘centered_on_ref‘ | obj c...