arxiv: 2605.14398 · v1 · submitted 2026-05-14 · 💻 cs.AI

Recognition: 2 theorem links

· Lean Theorem

Coding Agent Is Good As World Simulator

Hongyu Wang , Jingquan Wang , Bocheng Zou , Radu Serban , Dan Negrut

Authors on Pith no claims yet

Pith reviewed 2026-05-15 02:15 UTC · model grok-4.3

classification 💻 cs.AI

keywords agentcodesimulationmodelsphysicalvisualframeworkworld

0 comments

The pith

A multi-agent framework generates and refines executable physics simulation code from prompts to create world models that enforce physical constraints, claiming superior accuracy and fidelity over video-based alternatives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

World models help AI systems predict and interact with environments. Most recent ones generate future video frames from past ones, but these often produce unrealistic results such as objects passing through each other or unstable motion because they learn statistical patterns without explicit physical rules. The paper instead uses software agents to write actual simulation code. A planning agent converts a text prompt into a structured scene description. A code agent implements it using a physics engine. A visual review agent examines the rendered output and a physics analysis agent checks consistency with rules like gravity, contacts, and momentum. The code is revised based on their feedback until the simulation matches the prompt and physical constraints. The resulting code can be executed step by step, allowing interactive use. The abstract reports that this code-based method achieves better physical accuracy, instruction following, and visual quality than advanced video models, with potential uses in driving simulation and robot training where correct physics matters for safe learning.

Core claim

Experimental results show that our framework outperforms advanced video-based models in physical accuracy, instruction fidelity and visual quality, which could be applied to various scenarios including driving simulation and embodied robot tasks.

Load-bearing premise

The assumption that the visual review and physics analysis agents can reliably detect and guide corrections for physical inconsistencies in generated code without ground-truth physics data or human intervention, allowing the iterative process to converge to valid simulations.

Figures

Figures reproduced from arXiv: 2605.14398 by Bocheng Zou, Dan Negrut, Hongyu Wang, Jingquan Wang, Radu Serban.

**Figure 1.** Figure 1: multi-agent pipeline. matches the confirmed plan. If it does, the system proceeds to the next stage. Otherwise, the validator returns a structured error report, and the code agent patches the current program. The loop ends until the final program satisfies all steps in the plan. 2.3 Plan Agent The plan agent converts an underspecified user request into a simulator-oriented plan before code generation, whic… view at source ↗

**Figure 2.** Figure 2: Robot in office scene. 25 [PITH_FULL_IMAGE:figures/full_fig_p025_2.png] view at source ↗

**Figure 3.** Figure 3: Vehicle in outdoor scene. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_3.png] view at source ↗

**Figure 4.** Figure 4: Vehicle through FSI ground. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_4.png] view at source ↗

read the original abstract

World models have emerged as a powerful paradigm for building interactive simulation environments, with recent video-based approaches demonstrating impressive progress in generating visually plausible dynamics. However, because these models typically infer dynamics from video and represent them in latent states, they do not explicitly enforce physical constraints. As a result, the generated video rollouts are not physically plausible, exhibiting unstable contacts, distorted shapes, or inconsistent motion. In this paper, we present an agentic framework constructing physics-based world models through executable simulation code. The framework coordinates planning, code generation, visual review, and physics analysis agents. The planning agent converts the natural language prompt into a structured scene plan, the code agent implements it as executable simulation code, and the visual review agent provide visual feedback while the physics analysis agent checks physical consistency. The code is iteratively revised based on the feedback until the simulation matches the prompt reqirements and physical constraints. Experimental results show that our framework outperforms advanced video-based models in physical accuracy, instruction fidelity and visual quality, which could be applied to various scenarios including driving simulation and embodied robot tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The multi-agent code pipeline for turning prompts into executable physics sims is a reasonable direction, but the physics validation step looks under-supported without external checks.

read the letter

The main point is that this paper outlines an agent team that breaks a natural language scene description into a plan, writes simulation code for it, then loops through visual review and physics analysis to fix problems until the code runs and matches the request. That coordination is the fresh piece relative to video-only world models that just predict frames in latent space. It directly targets the usual problems like bad contacts or shape distortion by forcing everything through runnable code instead of hoping the model learned the right dynamics. The setup for embodied tasks like robot training or driving sims makes sense as a way to get controllable environments. The abstract claims the results beat video baselines on physical accuracy, fidelity, and visuals, which would be a useful step if the numbers hold. The soft spot is the physics analysis agent itself. Nothing in the description shows it has ground-truth trajectories, an external solver, or formal verification to confirm its corrections actually produce valid physics rather than just plausible-looking fixes. If the agent misses a violation or accepts an approximate patch, the final code can still fail physically while the framework counts it as success. Without those details or the actual metrics and baselines, the outperformance claim is hard to evaluate. This is aimed at people building interactive simulators for robotics or autonomous systems who are already looking at agentic or code-based alternatives to pure generative video. It deserves a serious referee pass so the experiments and the physics agent's reliability can be checked in full, even if the validation approach needs tightening.

Referee Report

2 major / 1 minor

Summary. The paper presents an agentic framework for constructing physics-based world models via executable simulation code. A planning agent converts natural-language prompts into scene plans, a code agent generates simulation code, and visual-review plus physics-analysis agents provide iterative feedback to revise the code until it satisfies both prompt requirements and physical constraints. The central claim is that this code-based approach outperforms advanced video-based world models in physical accuracy, instruction fidelity, and visual quality, with applications to driving simulation and embodied robotics.

Significance. If the empirical results hold, the work would offer a concrete alternative to latent video dynamics by enforcing explicit, executable physics, which could improve controllability and long-horizon consistency in interactive simulators.

major comments (2)

[Abstract] Abstract: the assertion that the framework 'outperforms advanced video-based models in physical accuracy, instruction fidelity and visual quality' is presented without any metrics, baselines, dataset descriptions, or experimental methodology. This absence leaves the central empirical claim unsupported in the provided text.
[Abstract] Framework description (Abstract): the physics analysis agent is described as checking 'physical consistency' and driving code revisions solely via visual feedback and its own reasoning, with no reference to ground-truth trajectories, an external physics engine, or formal verification. Without such grounding it is unclear how the agent can reliably detect or correct violations such as unstable contacts or inconsistent motion, undermining attribution of any reported superiority to enforced physics rather than agent self-consistency.

minor comments (1)

[Abstract] Abstract: 'reqirements' is a typo and should read 'requirements'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comments point by point below. We agree that the abstract requires strengthening to better support the empirical claims and will revise it accordingly while preserving the core contribution of the agentic code-based framework.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion that the framework 'outperforms advanced video-based models in physical accuracy, instruction fidelity and visual quality' is presented without any metrics, baselines, dataset descriptions, or experimental methodology. This absence leaves the central empirical claim unsupported in the provided text.

Authors: We agree that the abstract, due to length constraints, omits specific metrics, baselines, and methodology details. The full manuscript (Section 4) provides these: comparisons against video models (e.g., specific baselines like those in recent works on video dynamics) using quantitative metrics such as physical violation counts, instruction adherence scores, and visual quality assessments on datasets including driving and robotics scenarios. In the revision, we will expand the abstract with a concise statement of key results (e.g., 'outperforms baselines by X% in physical accuracy on Y benchmark') to make the claim self-contained while directing readers to the experiments. revision: yes
Referee: [Abstract] Framework description (Abstract): the physics analysis agent is described as checking 'physical consistency' and driving code revisions solely via visual feedback and its own reasoning, with no reference to ground-truth trajectories, an external physics engine, or formal verification. Without such grounding it is unclear how the agent can reliably detect or correct violations such as unstable contacts or inconsistent motion, undermining attribution of any reported superiority to enforced physics rather than agent self-consistency.

Authors: The physics analysis agent detects violations by rendering simulation outputs and applying its pre-trained knowledge of physical laws to analyze motion, contacts, and stability directly from the visual feedback and generated code. This LLM-driven reasoning identifies issues like unstable contacts or inconsistent trajectories without external engines, allowing iterative code revisions to enforce constraints explicitly in the executable simulation. The superiority stems from the final code being physically grounded and runnable, unlike latent video models. We will revise the abstract and add a methods subsection with prompt examples and case studies of detected/corrected violations to clarify the process. revision: partial

Circularity Check

0 steps flagged

No circularity: agentic framework is an external iterative loop with no self-definitional reductions or fitted predictions

full rationale

The paper presents a multi-agent system (planning, code generation, visual review, physics analysis) that generates and refines executable simulation code via feedback until prompt and constraint satisfaction. No equations, parameters, or derivations appear that reduce to their own inputs by construction. Claims of outperformance rest on external experimental comparisons to video models rather than self-referential metrics or self-citations. The physics analysis step operates on visual and reasoning feedback without being defined in terms of the final output. This is a standard engineering description of an iterative pipeline and remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that specialized agents can iteratively produce and verify correct physics code from prompts; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Specialized agents can generate, review, and iteratively refine executable simulation code to satisfy both natural language prompts and physical constraints
The framework description assumes this capability without detailing mechanisms or providing evidence for reliable convergence.

pith-pipeline@v0.9.0 · 5489 in / 1333 out tokens · 50960 ms · 2026-05-15T02:15:46.574907+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The framework coordinates planning, code generation, visual review, and physics analysis agents... The code is iteratively revised based on the feedback until the simulation matches the prompt requirements and physical constraints.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Experimental results show that our framework outperforms advanced video-based models in physical accuracy...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · 9 internal anchors

[1]

Recurrent world models facilitate policy evolution

David Ha and Jürgen Schmidhuber. Recurrent world models facilitate policy evolution. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Gar- nett, editors,Advances in Neural Information Processing Systems, volume 31. Curran Asso- ciates, Inc., 2018. URL https://proceedings.neurips.cc/paper_files/paper/2018/file/ 2de5d16682c3c3500...

work page 2018
[2]

Learning latent dynamics for planning from pixels

Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. InInternational conference on machine learning, pages 2555–2565. PMLR, 2019

work page 2019
[3]

Dream to control: Learning behaviors by latent imagination

Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination. InInternational Conference on Learning Representations, 2020

work page 2020
[4]

Genie: Generative interactive environments

Jake Bruce, Michael D Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, et al. Genie: Generative interactive environments. InForty-first International Conference on Machine Learning, 2024

work page 2024
[5]

GAIA-1: A Generative World Model for Autonomous Driving

Anthony Hu, Lloyd Russell, Hudson Yeo, Zak Murez, George Fedoseev, Alex Kendall, Jamie Shotton, and Gianluca Corrado. Gaia-1: A generative world model for autonomous driving.arXiv preprint arXiv:2309.17080, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[6]

Video gen- eration models as world simulators

Tim Brooks, Bill Peebles, Connor Holmes, Will DePue, Yufei Guo, Li Jing, David Schnurr, Joe Taylor, Troy Luhman, Eric Luhman, Clarence Ng, Ricky Wang, and Aditya Ramesh. Video gen- eration models as world simulators. OpenAI, 2024. URL https://openai.com/research/ video-generation-models-as-world-simulators

work page 2024
[7]

Cosmos World Foundation Model Platform for Physical AI

Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, et al. Cosmos world foundation model platform for physical ai.arXiv preprint arXiv:2501.03575, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

World Simulation with Video Foundation Models for Physical AI

Arslan Ali, Junjie Bai, Maciej Bala, Yogesh Balaji, Aaron Blakeman, Tiffany Cai, Jiaxin Cao, Tianshi Cao, Elizabeth Cha, Yu-Wei Chao, et al. World simulation with video foundation models for physical ai.arXiv preprint arXiv:2511.00062, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[9]

Simulating the visual world with artificial intelligence: A roadmap.arXiv preprint arXiv:2511.08585, 2025

Jingtong Yue, Ziqi Huang, Zhaoxi Chen, Xintao Wang, Pengfei Wan, and Ziwei Liu. Simulating the visual world with artificial intelligence: A roadmap.arXiv preprint arXiv:2511.08585, 2025

work page arXiv 2025
[10]

Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020

Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, et al. Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020

work page 2020
[11]

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models.arXiv preprint arXiv:2301.04104, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[12]

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Mido Assran, Adrien Bardes, David Fan, Quentin Garrido, Russell Howes, Matthew Muckley, Ammar Rizvi, Claire Roberts, Koustuv Sinha, Artem Zholus, et al. V-jepa 2: Self-supervised video models enable understanding, prediction and planning.arXiv preprint arXiv:2506.09985, 2025. 10

work page internal anchor Pith review Pith/arXiv arXiv 2025
[13]

Irasim: Learning interactive real-robot action simulators.arXiv preprint arXiv:2406.14540, 2024

Fangqi Zhu, Hongtao Wu, Song Guo, Yuxiao Liu, Chilam Cheang, and Tao Kong. Irasim: Learning interactive real-robot action simulators.arXiv preprint arXiv:2406.14540, 2024

work page arXiv 2024
[14]

Robot learning from a physical world model.arXiv preprint arXiv:2511.07416, 2025

Jiageng Mao, Sicheng He, Hao-Ning Wu, Yang You, Shuyang Sun, Zhicheng Wang, Yanan Bao, Huizong Chen, Leonidas Guibas, Vitor Guizilini, et al. Robot learning from a physical world model.arXiv preprint arXiv:2511.07416, 2025

work page arXiv 2025
[15]

Physworld: From real videos to world models of deformable objects via physics-aware demonstration synthesis.arXiv preprint arXiv:2510.21447, 2025

Yu Yang, Zhilu Zhang, Xiang Zhang, Yihan Zeng, Hui Li, and Wangmeng Zuo. Physworld: From real videos to world models of deformable objects via physics-aware demonstration synthesis.arXiv preprint arXiv:2510.21447, 2025

work page arXiv 2025
[16]

Embodied ai agents: Modeling the world.arXiv preprint arXiv:2506.22355, 2025

Pascale Fung, Yoram Bachrach, Asli Celikyilmaz, Kamalika Chaudhuri, Delong Chen, Willy Chung, Emmanuel Dupoux, Hongyu Gong, Hervé Jégou, Alessandro Lazaric, et al. Embodied ai agents: Modeling the world.arXiv preprint arXiv:2506.22355, 2025

work page arXiv 2025
[17]

Mujoco: A physics engine for model-based control

Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pages 5026–5033. IEEE, 2012

work page 2012
[18]

Project Chrono - An Open-Source Physics Engine

Project Chrono. Project Chrono - An Open-Source Physics Engine. https://projectchrono.org/, 2026

work page 2026
[19]

Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning

Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, et al. Isaac gym: High performance gpu-based physics simulation for robot learning.arXiv preprint arXiv:2108.10470, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[20]

Sapien: A simulated part-based interactive environment

Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, He Wang, et al. Sapien: A simulated part-based interactive environment. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11097–11107, 2020

work page 2020
[21]

Habitat 2.0: Training home assistants to rearrange their habitat

Andrew Szot, Alexander Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Singh Chaplot, Oleksandr Maksymets, et al. Habitat 2.0: Training home assistants to rearrange their habitat. InAdvances in Neural Information Processing Systems, volume 34, pages 251–266, 2021

work page 2021
[22]

igibson 2.0: Object-centric simulation for robot learning of everyday household tasks

Chengshu Li, Fei Xia, Roberto Martin-Martin, Michael Lingelbach, Sanjana Srivastava, Bokui Shen, Kent Elliott Vainio, Cem Gokmen, Gokul Dharan, Tanish Jain, et al. igibson 2.0: Object-centric simulation for robot learning of everyday household tasks. InConference on Robot Learning, pages 455–465. PMLR, 2022

work page 2022
[23]

Procthor: Large-scale embodied ai using procedural generation.Advances in Neural Information Processing Systems, 35:5982–5994, 2022

Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Kiana Ehsani, Jordi Salvador, Winson Han, Eric Kolve, Aniruddha Kembhavi, and Roozbeh Mottaghi. Procthor: Large-scale embodied ai using procedural generation.Advances in Neural Information Processing Systems, 35:5982–5994, 2022

work page 2022
[24]

Atiss: Autoregressive transformers for indoor scene synthesis

Despoina Paschalidou, Amlan Kar, Maria Shugrina, Karsten Kreis, Andreas Geiger, and Sanja Fidler. Atiss: Autoregressive transformers for indoor scene synthesis. InAdvances in Neural Information Processing Systems, volume 34, pages 12013–12026, 2021

work page 2021
[25]

Commonscenes: Generating commonsense 3d indoor scenes with scene graph diffusion

Zifan Zhuang, Yian Wang, Xiaowen Qiu, Wojciech Matusik, Joshua B Tenenbaum, and Chuang Gan. Commonscenes: Generating commonsense 3d indoor scenes with scene graph diffusion. InAdvances in Neural Information Processing Systems, volume 36, 2023

work page 2023
[26]

Holodeck: Language guided generation of 3d embodied ai environments

Yue Yang, Fan Zhao, Yihan Zhu, Peiyuan Zhang, Xiaofeng Chen, and Siyuan Huang. Holodeck: Language guided generation of 3d embodied ai environments. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

work page 2024
[27]

Physcene: Physically interactable 3d scene synthesis for embodied ai

Yandan Yang, Baoxiong Jia, Peiyuan Zhi, and Siyuan Huang. Physcene: Physically interactable 3d scene synthesis for embodied ai. InProceedings of Conference on Computer Vision and Pattern Recognition (CVPR), 2024

work page 2024
[28]

Physcensis: Physics-augmented llm agents for complex physical scene arrangement

Yian Wang, Han Yang, Minghao Guo, Xiaowen Qiu, Tsun-Hsuan Wang, Wojciech Matusik, Joshua B Tenenbaum, and Chuang Gan. Physcensis: Physics-augmented llm agents for complex physical scene arrangement. InThe Fourteenth International Conference on Learning Representations, 2026

work page 2026
[29]

Re- Act: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. Re- Act: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR), 2023

work page 2023
[30]

Reflexion: Language agents with verbal reinforcement learning

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning. InAdvances in Neural Information Processing Systems, volume 36, 2023. 11

work page 2023
[31]

Self-refine: Iterative refinement with self-feedback

Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al. Self-refine: Iterative refinement with self-feedback. Advances in neural information processing systems, 36:46534–46594, 2023

work page 2023
[32]

Executable code actions elicit better llm agents

Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, and Heng Ji. Executable code actions elicit better llm agents. InForty-first International Conference on Machine Learning, 2024

work page 2024
[33]

SWE-agent: Agent-computer interfaces enable automated software engineering

John Yang, Carlos E Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik R Narasimhan, and Ofir Press. SWE-agent: Agent-computer interfaces enable automated software engineering. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https: //openreview.net/forum?id=mXpq6ut8J3

work page 2024
[34]

Autocoderover: Autonomous program improvement.arXiv preprint arXiv:2404.05427, 2024

Yuntong Zhang, Haifeng Ruan, Zhiyu Fan, and Abhik Roychoudhury. Autocoderover: Autonomous program improvement.arXiv preprint arXiv:2404.05427, 2024

work page arXiv 2024
[35]

Simbench: A framework for evaluating and diagnosing llm-based digital-twin generation for multi-physics simulation.IEEE Access, 14:61784–61808, 2026

Jingquan Wang, Andrew Negrut, Hongyu Wang, Harry Zhang, and Dan Negrut. Simbench: A framework for evaluating and diagnosing llm-based digital-twin generation for multi-physics simulation.IEEE Access, 14:61784–61808, 2026. doi: 10.1109/ACCESS.2026.3685519

work page doi:10.1109/access.2026.3685519 2026
[36]

Chronollm: customizing language models for physics-based simulation code generation

Jingquan Wang, Andrew Negrut, Harry Zhang, Khailanii Slaton, Shu Wang, Radu Serban, Jinlong Wu, and Dan Negrut. Chronollm: customizing language models for physics-based simulation code generation. Multibody System Dynamics, pages 1–45, 2026

work page 2026
[37]

CAMEL: Communicative agents for ”mind” exploration of large language model society

Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. CAMEL: Communicative agents for ”mind” exploration of large language model society. InThirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum? id=3IyL2XWDkG

work page 2023
[38]

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. Autogen: Enabling next-gen llm applications via multi-agent conversation. arXiv preprint arXiv:2308.08155, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[39]

MetaGPT: Meta programming for a multi-agent collaborative framework

Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and Jürgen Schmidhuber. MetaGPT: Meta programming for a multi-agent collaborative framework. In The Twelfth International Conference on Learning Representations, 20...

work page 2024
[40]

Multi-Agent Collaboration Mechanisms: A Survey of LLMs

Khanh-Tung Tran, Dung Dao, Minh-Duong Nguyen, Quoc-Viet Pham, Barry O’Sullivan, and Hoang D Nguyen. Multi-agent collaboration mechanisms: A survey of llms.arXiv preprint arXiv:2501.06322, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[41]

A self-correcting multi-agent llm framework for language-based physics simulation and explanation.npj Artificial Intelligence, 2(1):10, 2026

Donggeun Park, Hyeonbin Moon, and Seunghwa Ryu. A self-correcting multi-agent llm framework for language-based physics simulation and explanation.npj Artificial Intelligence, 2(1):10, 2026

work page 2026
[42]

Creation, evaluation and self- validation of simulation models with large language models.Neurocomputing, page 132030, 2025

Tobias Möltner, Peter Manzl, Michael Pieber, and Johannes Gerstmayr. Creation, evaluation and self- validation of simulation models with large language models.Neurocomputing, page 132030, 2025

work page 2025
[43]

Sketchfab.https://sketchfab.com, 2012

Sketchfab Development Team. Sketchfab.https://sketchfab.com, 2012

work page 2012
[44]

Approximate convex decomposition for 3d meshes with collision-aware concavity and tree search.ACM Transactions on Graphics (TOG), 41(4):1–18, 2022

Xinyue Wei, Minghua Liu, Zhan Ling, and Hao Su. Approximate convex decomposition for 3d meshes with collision-aware concavity and tree search.ACM Transactions on Graphics (TOG), 41(4):1–18, 2022

work page 2022
[45]

Gonzalez, Ion Stoica, Song Han, and Yao Lu

Dacheng Li, Yunhao Fang, Yukang Chen, Shuo Yang, Shiyi Cao, Justin Wong, Michael Luo, Xiaolong Wang, Hongxu Yin, Joseph E. Gonzalez, Ion Stoica, Song Han, and Yao Lu. Worldmodelbench: Judging video generation models as world models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2026. URL https...

work page 2026
[46]

Wan: Open and Advanced Large-Scale Video Generative Models

Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, et al. Wan: Open and advanced large-scale video generative models.arXiv preprint arXiv:2503.20314, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[47]

-z"‘ -> ‘camera_up=[0,0,1]‘ - ‘gravity_axis=

Blender Foundation. Blender, 2024. URLhttps://www.blender.org. Version 4.0. 12 A Appendix A.1 The Use of Large Language Models In the preparation of this manuscript, the LLM was used for tasks such as grammar correction, sentence restructuring, and improving the overall readability of the manuscript. The LLM also assisted with code debuging and optimizati...

work page 2024
[48]

- Procedural ‘size: [sx, sy, sz]‘ is full width/depth/height, never half-extents

Resolve every object’s full extents first. - Procedural ‘size: [sx, sy, sz]‘ is full width/depth/height, never half-extents. - For generated boxes, tanks, platforms, floating plates, and ramps, bbox values are derived directly from ‘size‘. - If a size is not specified and cannot be derived from an enclosing object or vehicle wheelbase, add ‘clarifications...

work page
[49]

tank starts at the origin

Pick one coordinate convention for the enclosing scene object and keep it. - Regular box containers use the normal center convention. - ‘generated_boundary‘ containers/tanks/channels are special: center in ‘x/y‘, floor at ‘position.z‘, rim at ‘position.z + size.z‘. A 4 x 2 x 1 m generated-boundary tank with floor at z=0 therefore has ‘position=(0, 0, 0)‘,...

work page
[50]

Apply predicates to anchors (‘min_x‘, ‘max_x‘, ‘center_x‘, ‘bottom_z‘, ‘top_z‘) and recompute ‘position‘ from the final bbox

work page
[51]

-x": A.max_x = B.min_x - distance side =

Serialize the final numeric center in ‘position‘ for rigid/procedural bodies. For SPH fluid rows with ‘FREE-SURFACE-AT‘, the row’s ‘position.z‘ may denote the free-surface marker; codegen derives the sampler center separately from the free-surface height. ### Common coordinate derivations Container centered at the scene origin: ‘‘‘text B.center_x = 0 B.ce...

work page
[52]

- Declare ‘topology.reference_heights‘ for shared z-layers

Set scene invariants. - Declare ‘topology.reference_heights‘ for shared z-layers

work page
[53]

- Fluid domain: ‘FREE-SURFACE-AT‘

Pick predicate families per body. - Fluid domain: ‘FREE-SURFACE-AT‘. - Fluid container: ‘CONTAINS-FLUID‘ + resolved z + optional in-plane predicates. - Buoyant body: ‘FLOATS-AT-SURFACE‘ + height + in-plane predicates. - Flank along an axis: cardinal ‘FRONT-OF‘ / ‘BACK-OF‘ / ‘LEFT-OF‘ / ‘RIGHT-OF‘ + height + transverse alignment. - Bridge/beam spanning fla...

work page
[54]

- Size declarations first (‘HEIGHT‘ and known ‘size‘), then z/support anchors, then in-plane placement, then orientation

Emit predicates in dependency order. - Size declarations first (‘HEIGHT‘ and known ‘size‘), then z/support anchors, then in-plane placement, then orientation. - Keep all predicates for the same subject contiguous. - A referenced object must already be fully placed, except ‘"root"‘

work page
[55]

subject":

Self-check the resolved state. - Every asset and scene object appears in ‘objects[]‘; child objects carry a ‘topology.relation‘ or an equivalent resolved ‘pose‘. - Every row has concrete numeric ‘position.x/y/z‘ and ‘orientation.deg_z‘. - Spanning bodies lie between their flanks (check via ‘position.x‘ / ‘position.y‘ matching the midpoint of the flank cen...

work page
[56]

vehicle starts on platform

Read the user’s intent ("vehicle starts on platform" / "platforms beside the tank with tops level" / "plate floats on water")

work page
[57]

Map intent→kind from the table above

work page
[58]

Within that kind, pick the named variant that matches the geometric detail (which face? which Z alignment? which submersion depth?)

work page
[59]

convex\" for static, \

The formula for the chosen variant is in the corresponding subsection below. 22 ### Stacking / on-top patterns | ‘relation‘ | Formula | |---|---| | ‘spawned_on_top‘ | obj sits centered on top of ref. ‘obj.x = ref.x‘, ‘obj.y = ref.y‘, ‘obj.z = ref.z + ref.size.z/2 + obj.size.z/2‘ | | ‘placed_on_top‘ | alias of ‘spawned_on_top‘ | | ‘centered_on_ref‘ | obj c...

work page