Specification-Driven Generation and Evaluation of Discrete-Event World Models via the DEVS Formalism

Chuanhao Li; Huiteng Zhuang; Zheyu Chen; Zhuohuan Li

arxiv: 2603.03784 · v2 · pith:CKRKZTMInew · submitted 2026-03-04 · 💻 cs.AI

Specification-Driven Generation and Evaluation of Discrete-Event World Models via the DEVS Formalism

Zheyu Chen , Huiteng Zhuang , Zhuohuan Li , Chuanhao Li This is my paper

Pith reviewed 2026-05-22 10:17 UTC · model grok-4.3

classification 💻 cs.AI

keywords discrete-event world modelsDEVS formalismspecification-driven generationLLM agentslong-horizon rolloutsevent trace validationonline synthesis

0 comments

The pith

Natural-language specifications can be turned into reliable discrete-event world models for LLM agents via a staged pipeline grounded in the DEVS formalism.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes synthesizing discrete-event world models from natural-language specifications for LLM agents operating in event-driven domains such as supply chains and business processes. These domains evolve through discrete events, timing constraints, and causal links rather than continuous physical dynamics. The approach occupies a middle ground between hand-engineered simulators, which are consistent but expensive to adapt, and neural models, which are flexible but prone to accumulating errors over long rollouts. A staged LLM pipeline first infers component interactions and structure, then derives event and timing logic for each component using the DEVS formalism. Evaluation relies on benchmark suites that generate structured event traces and validate them against temporal, causal, and semantic constraints derived from the original specifications.

Core claim

Adopting the DEVS formalism, a staged LLM-based generation pipeline separates structural inference over component interactions from component-level event and timing logic, yielding world models that remain consistent over long-horizon rollouts, support verification from observable behavior, and can be synthesized efficiently on demand during online execution.

What carries the argument

The DEVS formalism, a formal specification method for discrete-event systems built from hierarchical components with input/output ports, internal state transitions, and time-advance functions.

If this is right

World models can be generated and adapted online without manual simulator engineering for each new scenario.
Structured event traces allow reproducible verification and pinpoint diagnosis of any deviations from the input specification.
The resulting models combine the reproducibility of explicit simulators with the flexibility to respond to new natural-language descriptions.
Verification against specification-derived constraints becomes feasible directly from observable simulator output.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This staged separation of structure from behavior logic may serve as a template for grounding other LLM-generated artifacts in formalisms that support long-horizon consistency.
The same pipeline could be tested on hybrid systems that mix discrete events with continuous dynamics to see whether the separation still prevents error accumulation.
If successful, agents could plan and evaluate actions in organizational or logistical settings with greater reliability than current neural-only world models allow.

Load-bearing premise

The LLM pipeline can accurately infer component structures and derive event timing logic from text without introducing inconsistencies that compound during extended simulations.

What would settle it

Generate a model from a specification, run long-horizon rollouts, and check whether the emitted event traces begin to violate the temporal or causal constraints stated in the original specification.

Figures

Figures reproduced from arXiv: 2603.03784 by Chuanhao Li, Huiteng Zhuang, Zheyu Chen, Zhuohuan Li.

**Figure 1.** Figure 1: Illustrative example of the generation and execution of a discrete-event world model for a warehouse robot fleet [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: Generation pipeline of the discrete-event world model for warehouse robot fleet restocking. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Ablation study on synthesis latency (using GPT-5.2). The chart compares the wall-clock time required for the [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Visualized PlanTree hierarchy for the ABP Model. The root model recursively decomposes into sub-models [PITH_FULL_IMAGE:figures/full_fig_p033_4.png] view at source ↗

**Figure 5.** Figure 5: The final connection of the ABP DEVS model. [PITH_FULL_IMAGE:figures/full_fig_p034_5.png] view at source ↗

read the original abstract

World models are central to LLM agents that must evaluate actions over long horizons. Yet much existing work focuses on environments governed by physical dynamics or spatial structure, whereas many high-impact domains, including supply chains, procurement networks, and business processes, evolve through discrete events, timing constraints, and causal dependencies. These settings call for discrete-event world models. Existing approaches to constructing world models often fall near two extremes: hand-engineered simulators provide consistency and reproducibility, but are costly to build and adapt; neural models are flexible, but can suffer from compounding inconsistency over long-horizon rollouts. We seek a principled middle ground by synthesizing discrete-event world models online from natural-language specifications, retaining the reliability of explicit simulators while gaining the adaptability of neural models. We adopt the DEVS formalism and introduce a staged LLM-based generation pipeline that separates structural inference over component interactions from component-level event and timing logic. For evaluation, we develop benchmark suites in which simulators emit structured event traces, which are then validated against specification-derived temporal, causal, and semantic constraints. This enables reproducible verification and localized diagnostics. Together, these contributions produce world models that remain consistent over long-horizon rollouts, can be verified from observable behavior, and can be synthesized efficiently on demand during online execution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sketches a staged LLM pipeline to generate DEVS discrete-event simulators from natural-language specs for LLM agents, but the abstract supplies no results or error data so the consistency claim stays untested.

read the letter

The main point is that the authors want to generate reliable discrete-event world models for LLM agents by feeding natural-language specs into an LLM that first infers component structure and then derives event and timing logic, all inside the DEVS formalism, with trace validation against the original constraints. This targets supply chains and business processes where timing and causality dominate over continuous physics. The separation of stages is the clearest new element; prior work on neural world models rarely decomposes the problem this way or anchors it in a formal discrete-event semantics. The benchmark plan that emits structured traces and checks them for temporal and causal violations is also a practical step for localized debugging. That framing of the middle ground between hand-built and fully neural models is reasonable and addresses a real domain gap. The soft spot is the missing substance. The writeup describes the pipeline and the evaluation approach but gives no generated examples, no rollout lengths, no error rates, and no comparison to simpler baselines or hand-crafted DEVS models. Without those, it is impossible to tell whether the second stage actually preserves timing consistency or whether inconsistencies only appear after many coupled events. The stress-test note on this exact risk is on target given what is shown. Readers working on reliable planning for logistics or workflow agents would get the most from the DEVS angle and the verification setup. The paper is coherent on its own terms and engages the right literature, so it deserves a serious referee who can ask for the missing implementation details and empirical checks.

Referee Report

2 major / 1 minor

Summary. The paper proposes synthesizing discrete-event world models from natural-language specifications using the DEVS formalism. It introduces a staged LLM pipeline that first performs structural inference over component interactions and then derives component-level event and timing logic. Evaluation relies on benchmark suites where simulators produce structured event traces that are validated against specification-derived temporal, causal, and semantic constraints, with the goal of ensuring consistency over long-horizon rollouts while combining the reliability of explicit simulators with the adaptability of neural models.

Significance. If the staged pipeline can be shown to produce verifiable DEVS models without compounding inconsistencies, the work would provide a useful middle ground for high-impact discrete-event domains such as supply chains and business processes. The formal grounding in DEVS and the emphasis on observable-trace validation are positive features that could support reproducible verification. However, the absence of any concrete derivations, empirical results, error metrics, or validation data in the manuscript makes the practical significance difficult to assess at present.

major comments (2)

[staged LLM-based generation pipeline] The manuscript provides no concrete mechanism (e.g., type checking, invariant extraction, or cross-validation between the structural-inference and component-logic stages) that would prevent or detect timing or causality errors introduced by the LLM in the second stage. This is load-bearing for the central claim that the generated models retain the reliability of explicit simulators over long-horizon rollouts.
[evaluation and benchmark suites] No empirical results, error rates, or example benchmark outcomes are reported to demonstrate that the generated DEVS models satisfy the specification-derived constraints. Without such data it is not possible to evaluate whether the proposed evaluation approach actually supports the consistency claims.

minor comments (1)

[abstract] The abstract and introduction would benefit from a short concrete example of a natural-language specification and the corresponding DEVS structure to illustrate the pipeline stages.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. The comments identify key areas where additional detail and evidence are needed to support the central claims. We respond to each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [staged LLM-based generation pipeline] The manuscript provides no concrete mechanism (e.g., type checking, invariant extraction, or cross-validation between the structural-inference and component-logic stages) that would prevent or detect timing or causality errors introduced by the LLM in the second stage. This is load-bearing for the central claim that the generated models retain the reliability of explicit simulators over long-horizon rollouts.

Authors: We agree that the current manuscript describes the staged pipeline at a high level and does not yet specify concrete mechanisms such as type checking, invariant extraction, or cross-validation to mitigate errors between the structural-inference and component-logic stages. In the revised manuscript we will expand the pipeline description to include these mechanisms explicitly: the structural stage will output typed interfaces and extracted invariants that constrain the component-logic stage, followed by an automated cross-validation step that checks for timing and causality consistency before the DEVS model is finalized. These additions will directly bolster the claim of simulator-like reliability over long horizons. revision: yes
Referee: [evaluation and benchmark suites] No empirical results, error rates, or example benchmark outcomes are reported to demonstrate that the generated DEVS models satisfy the specification-derived constraints. Without such data it is not possible to evaluate whether the proposed evaluation approach actually supports the consistency claims.

Authors: The manuscript currently emphasizes the framework, benchmark design, and validation methodology but does not report empirical results, error rates, or concrete benchmark outcomes. We acknowledge that quantitative evidence is required to substantiate the consistency claims. In the revision we will add a new evaluation section that presents preliminary results from the benchmark suites, including error rates for temporal, causal, and semantic constraint violations, example event traces, and analysis of long-horizon consistency. This will enable direct assessment of the evaluation approach. revision: yes

Circularity Check

0 steps flagged

No circularity: methodological proposal with external benchmarks

full rationale

The paper introduces a staged LLM pipeline for synthesizing DEVS world models from natural-language specifications and pairs it with benchmark suites that emit event traces for validation against specification-derived constraints. No equations, fitted parameters, predictions, or first-principles derivations appear in the provided text. The central claims rest on the separation of structural inference from component logic and on reproducible verification via observable traces, which are evaluated externally rather than defined in terms of the pipeline outputs themselves. No self-citations or ansatzes are invoked to justify load-bearing steps, and the approach is self-contained against the proposed benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the untested premise that current LLMs can reliably translate natural-language specifications into accurate DEVS structures and timing rules; the abstract invokes the DEVS formalism as a standard modeling tool but does not supply independent evidence for the translation step.

axioms (1)

standard math The DEVS formalism correctly captures discrete-event dynamics with timing and causal constraints.
Used as the target representation for all generated world models.

pith-pipeline@v0.9.0 · 5762 in / 1310 out tokens · 41113 ms · 2026-05-22T10:17:58.695341+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages · 2 internal anchors

[1]

Recurrent world models facilitate policy evolution.Advances in neural information processing systems, 31, 2018

David Ha and Jürgen Schmidhuber. Recurrent world models facilitate policy evolution.Advances in neural information processing systems, 31, 2018

work page 2018
[2]

Pwm: Policy learning with multi-task world models

Ignat Georgiev, Varun Giridhar, Nicklas Hansen, and Animesh Garg. Pwm: Policy learning with multi-task world models. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[3]

Reasoning with language model is planning with world model

Shibo Hao, Yi Gu, Haodi Ma, Joshua Hong, Zhen Wang, Daisy Wang, and Zhiting Hu. Reasoning with language model is planning with world model. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 8154–8173, 2023

work page 2023
[4]

Is your llm secretly a world model of the internet? model-based planning for web agents.arXiv preprint arXiv:2411.06559, 2024

Yu Gu, Kai Zhang, Yuting Ning, Boyuan Zheng, Boyu Gou, Tianci Xue, Cheng Chang, Sanjari Srivastava, Yanan Xie, Peng Qi, et al. Is your llm secretly a world model of the internet? model-based planning for web agents. arXiv preprint arXiv:2411.06559, 2024

work page arXiv 2024
[5]

Philip J. Ball, Jakob Bauer, Frank Belletti, Bethanie Brownfield, Ariel Ephrat, Shlomi Fruchter, Agrim Gupta, Kristian Holsheimer, Aleksander Holynski, Jiri Hron, Christos Kaplanis, Marjorie Limont, Matt McGill, Yanko Oliveira, Jack Parker-Holder, Frank Perbet, Guy Scully, Jeremy Shar, Stephen Spencer, Omer Tov, Ruben Villegas, Emma Wang, Jessica Yung, Ci...

work page 2025
[6]

Web world models.arXiv preprint arXiv:2512.23676, 2025

Jichen Feng, Yifan Zhang, Chenggong Zhang, Yifu Lu, Shilong Liu, and Mengdi Wang. Web world models.arXiv preprint arXiv:2512.23676, 2025

work page arXiv 2025
[7]

hallucinations

Yixia Li, Hongru Wang, Jiahao Qiu, Zhenfei Yin, Dongdong Zhang, Cheng Qian, Zeping Li, Pony Ma, Guanhua Chen, Heng Ji, et al. From word to world: Can large language models be implicit text-based world models?arXiv preprint arXiv:2512.18832, 2025

work page arXiv 2025
[8]

ALFWorld: Aligning Text and Embodied Environments for Interactive Learning

Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, Yonatan Bisk, Adam Trischler, and Matthew Hausknecht. Alfworld: Aligning text and embodied environments for interactive learning.arXiv preprint arXiv:2010.03768, 2020. 13 Generation and Evaluation of Discrete-Event World Models via the DEVS Formalism

work page internal anchor Pith review Pith/arXiv arXiv 2010
[9]

Humam Kourani, Alessandro Berti, Daniel Schuster, and Wil M. P. van der Aalst. Evaluating large language models on business process modeling: Framework, benchmark, and self-improvement analysis.Software and Systems Modeling, pages 1–36, 2025

work page 2025
[10]

Scalable, symbiotic, ai and non-ai agent based parallel discrete event simulations.arXiv preprint arXiv:2505.23846, 2025

Atanu Barai, Stephan Eidenbenz, and Nandakishore Santhi. Scalable, symbiotic, ai and non-ai agent based parallel discrete event simulations.arXiv preprint arXiv:2505.23846, 2025

work page arXiv 2025
[11]

Academic press, 2000

Bernard P Zeigler, Herbert Praehofer, and Tag Gon Kim.Theory of modeling and simulation. Academic press, 2000

work page 2000
[12]

Risco-Martín, Saurabh Mittal, Kevin Henares, Román Cardenas, and Patricia Arroba

José L. Risco-Martín, Saurabh Mittal, Kevin Henares, Román Cardenas, and Patricia Arroba. xDEVS: A toolkit for interoperable modeling and simulation of formal discrete event systems.Software: Practice and Experience, 53(3):748–789, March 2023

work page 2023
[13]

Language models meet world models: Embodied experiences enhance language models.Advances in neural information processing systems, 36:75392–75412, 2023

Jiannan Xiang, Tianhua Tao, Yi Gu, Tianmin Shu, Zirui Wang, Zichao Yang, and Zhiting Hu. Language models meet world models: Embodied experiences enhance language models.Advances in neural information processing systems, 36:75392–75412, 2023

work page 2023
[14]

Web agents with world models: Learning and leveraging environment dynamics in web navigation

Hyungjoo Chae, Namyoung Kim, Kai Tzu-iunn Ong, Minju Gwak, Gwanwoo Song, Jihoon Kim, Sunghwan Kim, Dongha Lee, and Jinyoung Yeo. Web agents with world models: Learning and leveraging environment dynamics in web navigation. InThe Thirteenth International Conference on Learning Representations (ICLR 2025), 2025

work page 2025
[15]

Daydreamer: World models for physical robot learning

Philipp Wu, Alejandro Escontrela, Danijar Hafner, Pieter Abbeel, and Ken Goldberg. Daydreamer: World models for physical robot learning. InConference on robot learning, pages 2226–2240. PMLR, 2023

work page 2023
[16]

Text2world: Benchmarking large language models for symbolic world model generation, 2025

Mengkang Hu, Tianxing Chen, Yude Zou, Yuheng Lei, Qiguang Chen, Ming Li, Yao Mu, Hongyuan Zhang, Wenqi Shao, and Ping Luo. Text2world: Benchmarking large language models for symbolic world model generation, 2025

work page 2025
[17]

Open-domain planning representations from texts

Renhao Zhang, Yilin Miao, and George Konidaris. Open-domain planning representations from texts. In Proceedings of the Natural Language Reasoning and Structured Explanations Workshop (NLRSE). Association for Computational Linguistics, 2024

work page 2024
[18]

Large language models as planning domain generators

James Oswald, Kavitha Srinivas, Harsha Kokel, Junkyu Lee, Michael Katz, and Shirin Sohrabi. Large language models as planning domain generators. InProceedings of the Thirty-Fourth International Conference on Automated Planning and Scheduling, pages 423–431, 2024. arXiv:2405.06650

work page arXiv 2024
[19]

Leveraging environment interaction for automated pddl translation and planning with large language models

Sadegh Mahdavi, Raquel Aoki, Keyi Tang, and Yanshuai Cao. Leveraging environment interaction for automated pddl translation and planning with large language models. InAdvances in Neural Information Processing Systems (NeurIPS 2024), 2024

work page 2024
[20]

A system model generation benchmark from natural language requirements, 2025

Dongming Jin, Zhi Jin, Linyu Li, Zheng Fang, Jia Li, and Xiaohong Chen. A system model generation benchmark from natural language requirements, 2025

work page 2025
[21]

J. Chen, B. Hu, W. Diao, and Y . Huang. Automatic generation of sysml requirement models based on free-text requirements. InProceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering (EITCE 2022), pages 242–248, 2022

work page 2022
[22]

Generating sysml behavior models via large language models

Han Zhou et al. Generating sysml behavior models via large language models. InProceedings of the 16th Asia-Pacific Symposium on Internetware (Internetware 2025), 2025

work page 2025
[23]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...

work page 2021
[24]

Jimenez, John Yang, Alexander Wettig, Kilian Lieret, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan

Carlos E. Jimenez, John Yang, Alexander Wettig, Kilian Lieret, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. SWE-bench: Can language models resolve real-world GitHub issues? InThe Twelfth International Conference on Learning Representations (ICLR 2024), 2024

work page 2024
[25]

RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems

Tianyang Liu, Canwen Xu, and Julian McAuley. Repobench: Benchmarking repository-level code auto- completion systems. InThe Twelfth International Conference on Learning Representations (ICLR 2024), 2024. arXiv:2306.03091. 14 Generation and Evaluation of Discrete-Event World Models via the DEVS Formalism

work page internal anchor Pith review Pith/arXiv arXiv 2024
[26]

Nl2repo-bench: Towards long-horizon repository generation evaluation of coding agents, 2025

Jingzhe Ding, Shengda Long, Changxin Pu, Huan Zhou, Hongwan Gao, Xiang Gao, Chao He, Yue Hou, Fei Hu, Zhaojian Li, Weiran Shi, Zaiyuan Wang, Daoguang Zan, Chenchen Zhang, Xiaoxu Zhang, Qizhi Chen, Xianfu Cheng, Bo Deng, Qingshui Gu, Kai Hua, Juntao Lin, Pai Liu, Mingchen Li, Xuanguang Pan, Zifan Peng, Yujia Qin, Yong Shan, Zhewen Tan, Weihao Xie, Zihan Wa...

work page 2025
[27]

Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press

John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. Swe-agent: Agent-computer interfaces enable automated software engineering, 2024

work page 2024
[28]

Fea-bench: A benchmark for evaluating repository-level code generation for feature implementation, 2025

Wei Li, Xin Zhang, Zhongxin Guo, Shaoguang Mao, Wen Luo, Guangyue Peng, Yangyu Huang, Houfeng Wang, and Scarlett Li. Fea-bench: A benchmark for evaluating repository-level code generation for feature implementation, 2025

work page 2025
[29]

Model based testing with labelled transition systems

Jan Tretmans. Model based testing with labelled transition systems. InFormal Methods and Testing, volume 4949 ofLecture Notes in Computer Science, pages 1–38. Springer, 2008

work page 2008
[30]

A brief account of runtime verification.Journal of Logic and Algebraic Programming, 78(5):293–303, 2009

Martin Leucker and Christian Schallhart. A brief account of runtime verification.Journal of Logic and Algebraic Programming, 78(5):293–303, 2009

work page 2009
[31]

An automated verification framework for devs-coupled models using devs-python.Processes, 13(5):1327, 2025

Gyeongmin Lee. An automated verification framework for devs-coupled models using devs-python.Processes, 13(5):1327, 2025

work page 2025
[32]

A unit testing platform to verify devs models

Ignacio Henares, David Ruiz-Martínez, and Eduardo Fernández-Medina. A unit testing platform to verify devs models. InProceedings of the 2020 Winter Simulation Conference (WSC), pages 2707–2718. IEEE, 2020

work page 2020
[33]

Sarjoughian

Jianhua Li and Hessam S. Sarjoughian. A testing framework for devs formalism implementations. InProceedings of the 2011 Winter Simulation Conference (WSC), pages 2735–2746. IEEE, 2011

work page 2011
[34]

Zeigler, James J

Bernard P. Zeigler, James J. Nutaro, and Changhwan Seo. Combining devs and model checking: Concepts and tools for integrating simulation and analysis.International Journal of Simulation and Process Modelling, 12(1):2–15, 2017

work page 2017
[35]

Nutaro, and Hessam S

Mohammad Gholami, James J. Nutaro, and Hessam S. Sarjoughian. Constrained-devs: A formalism for constrain- ing discrete event system specifications. InProceedings of the 2017 Winter Simulation Conference (WSC), pages 1523–1534. IEEE, 2017

work page 2017
[36]

On-the-fly verification of discrete event simulations by means of simulation purposes

Ricardo da Silva, Alexandre de Melo, and Rogério de Lemos. On-the-fly verification of discrete event simulations by means of simulation purposes. InProceedings of the 2011 Symposium on Theory of Modeling & Simulation – DEVS Integrative M&S Symposium, pages 73–80. Society for Modeling & Simulation International, 2011

work page 2011
[37]

McLaughlin and Hessam S

Michael J. McLaughlin and Hessam S. Sarjoughian. Devs-scripting: A black-box test frame for devs models. In Proceedings of the 2020 Winter Simulation Conference (WSC), pages 2666–2677. IEEE, 2020

work page 2020
[38]

Tsong Yueh Chen, S. C. Cheung, and S. M. Yiu. Metamorphic testing: A new approach for generating next test cases.Technical Report HKUST-CS98-01, 1998

work page 1998
[39]

" " 3Defines a specific argument , input port , or output port . 4

Xingyao Wang, Simon Rosenberg, Juan Michelini, Calvin Smith, Hoang Tran, Engel Nyst, Rohit Malhotra, Xuhui Zhou, Valerie Chen, Robert Brennan, and Graham Neubig. The openhands software agent sdk: A composable and extensible foundation for production agents, 2025. 15 Generation and Evaluation of Discrete-Event World Models via the DEVS Formalism A Structur...

work page 2025
[40]

** Separate Concerns **: Isolate the * Core Logic * ( Model ) from the * External C o n t r o l l e r *. - The External C o n t r o l l e r is r e s p o n s i b l e for stdin reading , external event injecting , ar gu me nt s parsing , s i m u l a t i o n control , and writing list [ dict ] logs in logger to stdout / stderr as JSONL . - The Core Model is ...

work page
[41]

** Extract Logic **: You are not writing code , but you MUST extract the specific a l g o r i t h m i c rules ( model structure , math , probabilities , delays ) into the ‘ function ‘ field

work page
[42]

Maintain a balance variable

** Extract Logging **: If any logging / e v e n t _ o u t p u t is required , describe what to log in the ‘ logging ‘ field . - You can add new logging events , but do not change the existing ones . - Unless specified , the output should be co mp let ed by the core model . ## [ Input Data ] ** Target Model Name **: ‘{ name } ‘ ** Raw R e q u i r e m e n t...

work page
[43]

- Only int , float , bool , str , dict , list are allowed

** Type D e s c r i p t i o n s **: - This is the r e q u i r e m e n t s for the ‘ model_init_args ‘ , ‘ input_ports ‘ , and ‘ output_ports ‘. - Only int , float , bool , str , dict , list are allowed

work page
[44]

dict " , st ru ct ur e =

** Complex Type Schema **: - For any input or output port that uses a ’ dict ’ or ’ list ’ type , you MUST e x p l i c i t l y describe the str uc tu re in the s tr uc tu re field . This includes listing all keys , their types , and a brief st ru ctu re for each key . - ** No Gold Plating **: strictly adhere to the data fields sp eci fi ed in the [ Parent...

work page
[45]

In a closed feedback loop ( AâĘŤB ) , co nf igu re Model A to send an initial ’ start ’ signal to Model B at T =0 to kick off the cycle

** Active Trigger **: " In a closed feedback loop ( AâĘŤB ) , co nf igu re Model A to send an initial ’ start ’ signal to Model B at T =0 to kick off the cycle ."

work page
[46]

I n i t i a l i z e the Manager with ‘ credits =N ‘ , allowing it to dispatch N tasks i m m e d i a t e l y without waiting for the first ’ worker_free ’ signals

** Pre - loaded Credit **: " I n i t i a l i z e the Manager with ‘ credits =N ‘ , allowing it to dispatch N tasks i m m e d i a t e l y without waiting for the first ’ worker_free ’ signals ."

work page
[47]

Co nf igu re Workers to send a ’ register ’ event to the Router at T =0 , ensuring the Router has valid d e s t i n a t i o n s p op ula te d before the first packet arrives

** Early R e g i s t r a t i o n **: " Co nf igu re Workers to send a ’ register ’ event to the Router at T =0 , ensuring the Router has valid d e s t i n a t i o n s p op ula te d before the first packet arrives ." - ** Supreme Court Rule **: If [ Parent Model S p e c i f i c a t i o n ] implies a c o n n e c t i o n that creates a cold - start deadlock ...

work page
[48]

models import Atomic , Coupled , Port ‘

Must import : ‘ from xdevs . models import Atomic , Coupled , Port ‘. Atomic for inherit , Coupled for __init__ arg type

work page
[49]

** I n h e r i t a n c e **: Inherit from ‘ Atomic ‘

work page
[50]

General function -

** D ocs tr in g **: The class MUST include a standard class do cs tri ng strictly f oll ow in g this format : ‘‘‘ python class { name }( Atomic ) : \"\"\" Function : - ... General function - ... Every state : how it transfer , and what to output after the state is over . Logging in this model : - ... - ... Input Ports : - p or t_ na me ( type ) : d e s c...

work page
[55]

name ") ) ‘ and ‘ self . a d d _ o u t _ p o r t ( Port ( type ,

Register Ports : Use ‘ self . a d d _ i n _ p o r t ( Port ( type , " name ") ) ‘ and ‘ self . a d d _ o u t _ p o r t ( Port ( type , " name ") ) ‘

work page
[56]

hold_in ( phase , time ) ‘

I n i t i a l i z e State : Set member v ar ia bl es and call ‘ self . hold_in ( phase , time ) ‘

work page
[57]

Log creation : ‘ self . logger . info ({{ keys : values , ...}} , log_type =...) ‘

work page
[58]

S O M E _ S T A T E

** Core Be ha vio rs **: - I mp le men t ‘ i n i t i a l i z e ( self ) ‘: Set initial state . Set phase / sigma using ‘ self . hold_in ( phase , time ) ‘. Log i n i t i a l i z a t i o n . - It can not send any output . If you need to send a initial signal ( e . g . report you are ready ) , you can use ‘ self . hold_in ( phase , time ) ‘ to schedule the ...

work page
[59]

And the keys and s t r u c t u r e s of the logs must match the S p e c i f i c a t i o n exactly

** logging r e q u i r e m e n t s **: make sure all the events required are logged . And the keys and s t r u c t u r e s of the logs must match the S p e c i f i c a t i o n exactly . Coupled Model Instructions (Injected into Main Prompt) ### [ Coupled Model Spe ci fi cs ]

work page
[60]

models import Atomic , Coupled , Port ‘

Must import : ‘ from xdevs . models import Atomic , Coupled , Port ‘

work page
[61]

** I n h e r i t a n c e **: Inherit from ‘ xdevs . models . Coupled ‘

work page
[62]

** D ocs tr in g **: The class MUST include a standard class do cs tri ng strictly f oll ow in g this format : ‘‘‘ python class { name }( Coupled ) : \"\"\" Function : - ... - ... - Sub - models : - s u b _ m o d e l _ c l a s s _ n a m e : name = s u b _ m o d e l _ i n s t a n c e _ n a m e . Brief d e s c r i p t i o n . Logging in this model : 26 Gene...

work page
[63]

I mpl em en t ONLY ‘ __init__ ‘

** C ont ai ne r Logic **: Treat this class as a pure s tru ct ur e c on ta in er . I mpl em en t ONLY ‘ __init__ ‘

work page
[64]

** Sub - models Imports **: Use relative imports for sub - models ( e . g . , ‘ from . folder . file import SubModelName ‘)

work page
[65]

using the fo ll ow in g format : ‘‘‘ python \"\"\" Args : name ( str ) : The unique name of the model

** C o n s t r u c t o r ( ‘ __init__ ‘) **: - S ig na tur e : ‘ def __init__ ( self , name : str , parent : Coupled | None , < e x p l i c i t _ c o n f i g _ a r g s >) ‘ - D oc st rin g : should have a do cst ri ng d e s c r i b i n g the arguments , i nc lu di ng the detailed type and d e s c r i p t i o n . using the fo ll ow in g format : ‘‘‘ python...

work page
[66]

__init__ ( name ) ‘

Call ‘ super () . __init__ ( name ) ‘

work page
[67]

parent = parent ‘

Assign ‘ self . parent = parent ‘

work page
[68]

logger = g e t _ s i m _ l o g g e r ( self ) ‘

I n i t i a l i z e logger : ‘ self . logger = g e t _ s i m _ l o g g e r ( self ) ‘

work page
[69]

a d d _ i n _ p o r t (...) ‘ and ‘ self

Register Ports : Use ‘ self . a d d _ i n _ p o r t (...) ‘ and ‘ self . a d d _ o u t _ p o r t (...) ‘

work page
[70]

a d d _ c o m p o n e n t ( instance ) ‘

I n s t a n t i a t e C o m p o n e n t s : Create sub - model in st an ces and register them via ‘ self . a d d _ c o m p o n e n t ( instance ) ‘

work page
[71]

p or t_n am e

Define Co up lin gs : Use ‘ self . a d d _ c o u p l i n g ( src , dst ) ‘ for : - ** EIC **: ‘ self . input [" p or t_n am e "] ‘ -> ‘ sub . input [" po rt_ na me "] ‘ - ** IC **: ‘ sub_a . output [" p ort _n am e "] ‘ -> ‘ sub_b . input [" p or t_ na me "] ‘ - ** EOC **: ‘ sub . output [" p or t_n am e "] ‘ -> ‘ self . output [" por t_ na me "] ‘

work page
[72]

Ground Truth

Log creation : ‘ self . logger . info (...) ‘ - Note : For steps 5 -6 , you should refer to Sub - Models to get the right init args names and port names . These i n f o r m a t i o n can be used as a c o r r e c t i o n and s u p p l e m e n t to the coupling logic ( in case some names are i n c o n s i s t e n t ) . B.3 Interface Adaptation Agent TheMode...

work page
[73]

The goal is to transmit a sequence of packets reliably using an A l t e r n a t i n g Bit Protocol ( ABP ) despite d e t e r m i n i s t i c packet loss in the channels

System O bj ec ti ve : Design a c o m m u n i c a t i o n system c o n s i s t i n g of a Sender , a Receiver , and two uni - d i r e c t i o n a l t r a n s m i s s i o n channels ( Subnets ) . The goal is to transmit a sequence of packets reliably using an A l t e r n a t i n g Bit Protocol ( ABP ) despite d e t e r m i n i s t i c packet loss in the ch...

work page
[74]

noise level

Entity B eh av io rs : 5The Sender : 6- Accepts a single control input at the start of s i m u l a t i o n : the total number of packets to send . 7- Before sending each packet , the Sender must undergo a p r e p a r a t i o n delay ( default 10 ms , c o n f i g u r a b l e via -- s e n d e r _ d e l a y ) . 8- The Receiver must maintain a buffer with cap...

work page
[75]

33- System starts at time 0.0 with all c o m p o n e n t s i n i t i a l i z e d to idle states

Scenario C o n s t r a i n t s : 32- Time Unit Mapping : 1.0 s i m u l a t i o n time unit = 1 M i l l i s e c o n d ( ms ) . 33- System starts at time 0.0 with all c o m p o n e n t s i n i t i a l i z e d to idle states . Listing 2: Natural Language Specification (S)

work page
[76]

4* ‘-- seed ‘ ( int ) : The i n i t i a l i z a t i o n seed for the noise g en er at or of both sides ( the ‘x ‘ value in the LCG formula )

Command Line Ar gu me nt s : 2The script must accept the f ol low in g named a rg ume nt s : 3* ‘-- total_packets ‘ ( int ) : The total number of packets the Sender intends to send in one session t ri gg er ed by a S T A R T _ B A T C H command . 4* ‘-- seed ‘ ( int ) : The i n i t i a l i z a t i o n seed for the noise g en er at or of both sides ( the ‘...

work page
[77]

The system uses command line a rgu me nt s for c o n f i g u r a t i o n

stdin Format : 12* No stdin input is required for this s i m u l a t i o n . The system uses command line a rgu me nt s for c o n f i g u r a t i o n . 13

work page
[78]

time ": < float > ,

** Standard Output ( stdout ) **: 15* Format : JSONL , one i n d e p e n d e n t JSON object per line 16* Each record MUST follow the format : ‘{" time ": < float > , " entity ": < str > , " event ": < str > , " payload ": < dict >} ‘ 17* ** Event Types and Formats **: 18Sender Events : 19- event : ‘ delay_start ‘ ( Sender starts p r e p a r a t i o n del...

work page

[1] [1]

Recurrent world models facilitate policy evolution.Advances in neural information processing systems, 31, 2018

David Ha and Jürgen Schmidhuber. Recurrent world models facilitate policy evolution.Advances in neural information processing systems, 31, 2018

work page 2018

[2] [2]

Pwm: Policy learning with multi-task world models

Ignat Georgiev, Varun Giridhar, Nicklas Hansen, and Animesh Garg. Pwm: Policy learning with multi-task world models. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025

[3] [3]

Reasoning with language model is planning with world model

Shibo Hao, Yi Gu, Haodi Ma, Joshua Hong, Zhen Wang, Daisy Wang, and Zhiting Hu. Reasoning with language model is planning with world model. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 8154–8173, 2023

work page 2023

[4] [4]

Is your llm secretly a world model of the internet? model-based planning for web agents.arXiv preprint arXiv:2411.06559, 2024

Yu Gu, Kai Zhang, Yuting Ning, Boyuan Zheng, Boyu Gou, Tianci Xue, Cheng Chang, Sanjari Srivastava, Yanan Xie, Peng Qi, et al. Is your llm secretly a world model of the internet? model-based planning for web agents. arXiv preprint arXiv:2411.06559, 2024

work page arXiv 2024

[5] [5]

Philip J. Ball, Jakob Bauer, Frank Belletti, Bethanie Brownfield, Ariel Ephrat, Shlomi Fruchter, Agrim Gupta, Kristian Holsheimer, Aleksander Holynski, Jiri Hron, Christos Kaplanis, Marjorie Limont, Matt McGill, Yanko Oliveira, Jack Parker-Holder, Frank Perbet, Guy Scully, Jeremy Shar, Stephen Spencer, Omer Tov, Ruben Villegas, Emma Wang, Jessica Yung, Ci...

work page 2025

[6] [6]

Web world models.arXiv preprint arXiv:2512.23676, 2025

Jichen Feng, Yifan Zhang, Chenggong Zhang, Yifu Lu, Shilong Liu, and Mengdi Wang. Web world models.arXiv preprint arXiv:2512.23676, 2025

work page arXiv 2025

[7] [7]

hallucinations

Yixia Li, Hongru Wang, Jiahao Qiu, Zhenfei Yin, Dongdong Zhang, Cheng Qian, Zeping Li, Pony Ma, Guanhua Chen, Heng Ji, et al. From word to world: Can large language models be implicit text-based world models?arXiv preprint arXiv:2512.18832, 2025

work page arXiv 2025

[8] [8]

ALFWorld: Aligning Text and Embodied Environments for Interactive Learning

Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, Yonatan Bisk, Adam Trischler, and Matthew Hausknecht. Alfworld: Aligning text and embodied environments for interactive learning.arXiv preprint arXiv:2010.03768, 2020. 13 Generation and Evaluation of Discrete-Event World Models via the DEVS Formalism

work page internal anchor Pith review Pith/arXiv arXiv 2010

[9] [9]

Humam Kourani, Alessandro Berti, Daniel Schuster, and Wil M. P. van der Aalst. Evaluating large language models on business process modeling: Framework, benchmark, and self-improvement analysis.Software and Systems Modeling, pages 1–36, 2025

work page 2025

[10] [10]

Scalable, symbiotic, ai and non-ai agent based parallel discrete event simulations.arXiv preprint arXiv:2505.23846, 2025

Atanu Barai, Stephan Eidenbenz, and Nandakishore Santhi. Scalable, symbiotic, ai and non-ai agent based parallel discrete event simulations.arXiv preprint arXiv:2505.23846, 2025

work page arXiv 2025

[11] [11]

Academic press, 2000

Bernard P Zeigler, Herbert Praehofer, and Tag Gon Kim.Theory of modeling and simulation. Academic press, 2000

work page 2000

[12] [12]

Risco-Martín, Saurabh Mittal, Kevin Henares, Román Cardenas, and Patricia Arroba

José L. Risco-Martín, Saurabh Mittal, Kevin Henares, Román Cardenas, and Patricia Arroba. xDEVS: A toolkit for interoperable modeling and simulation of formal discrete event systems.Software: Practice and Experience, 53(3):748–789, March 2023

work page 2023

[13] [13]

Language models meet world models: Embodied experiences enhance language models.Advances in neural information processing systems, 36:75392–75412, 2023

Jiannan Xiang, Tianhua Tao, Yi Gu, Tianmin Shu, Zirui Wang, Zichao Yang, and Zhiting Hu. Language models meet world models: Embodied experiences enhance language models.Advances in neural information processing systems, 36:75392–75412, 2023

work page 2023

[14] [14]

Web agents with world models: Learning and leveraging environment dynamics in web navigation

Hyungjoo Chae, Namyoung Kim, Kai Tzu-iunn Ong, Minju Gwak, Gwanwoo Song, Jihoon Kim, Sunghwan Kim, Dongha Lee, and Jinyoung Yeo. Web agents with world models: Learning and leveraging environment dynamics in web navigation. InThe Thirteenth International Conference on Learning Representations (ICLR 2025), 2025

work page 2025

[15] [15]

Daydreamer: World models for physical robot learning

Philipp Wu, Alejandro Escontrela, Danijar Hafner, Pieter Abbeel, and Ken Goldberg. Daydreamer: World models for physical robot learning. InConference on robot learning, pages 2226–2240. PMLR, 2023

work page 2023

[16] [16]

Text2world: Benchmarking large language models for symbolic world model generation, 2025

Mengkang Hu, Tianxing Chen, Yude Zou, Yuheng Lei, Qiguang Chen, Ming Li, Yao Mu, Hongyuan Zhang, Wenqi Shao, and Ping Luo. Text2world: Benchmarking large language models for symbolic world model generation, 2025

work page 2025

[17] [17]

Open-domain planning representations from texts

Renhao Zhang, Yilin Miao, and George Konidaris. Open-domain planning representations from texts. In Proceedings of the Natural Language Reasoning and Structured Explanations Workshop (NLRSE). Association for Computational Linguistics, 2024

work page 2024

[18] [18]

Large language models as planning domain generators

James Oswald, Kavitha Srinivas, Harsha Kokel, Junkyu Lee, Michael Katz, and Shirin Sohrabi. Large language models as planning domain generators. InProceedings of the Thirty-Fourth International Conference on Automated Planning and Scheduling, pages 423–431, 2024. arXiv:2405.06650

work page arXiv 2024

[19] [19]

Leveraging environment interaction for automated pddl translation and planning with large language models

Sadegh Mahdavi, Raquel Aoki, Keyi Tang, and Yanshuai Cao. Leveraging environment interaction for automated pddl translation and planning with large language models. InAdvances in Neural Information Processing Systems (NeurIPS 2024), 2024

work page 2024

[20] [20]

A system model generation benchmark from natural language requirements, 2025

Dongming Jin, Zhi Jin, Linyu Li, Zheng Fang, Jia Li, and Xiaohong Chen. A system model generation benchmark from natural language requirements, 2025

work page 2025

[21] [21]

J. Chen, B. Hu, W. Diao, and Y . Huang. Automatic generation of sysml requirement models based on free-text requirements. InProceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering (EITCE 2022), pages 242–248, 2022

work page 2022

[22] [22]

Generating sysml behavior models via large language models

Han Zhou et al. Generating sysml behavior models via large language models. InProceedings of the 16th Asia-Pacific Symposium on Internetware (Internetware 2025), 2025

work page 2025

[23] [23]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...

work page 2021

[24] [24]

Jimenez, John Yang, Alexander Wettig, Kilian Lieret, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan

Carlos E. Jimenez, John Yang, Alexander Wettig, Kilian Lieret, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. SWE-bench: Can language models resolve real-world GitHub issues? InThe Twelfth International Conference on Learning Representations (ICLR 2024), 2024

work page 2024

[25] [25]

RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems

Tianyang Liu, Canwen Xu, and Julian McAuley. Repobench: Benchmarking repository-level code auto- completion systems. InThe Twelfth International Conference on Learning Representations (ICLR 2024), 2024. arXiv:2306.03091. 14 Generation and Evaluation of Discrete-Event World Models via the DEVS Formalism

work page internal anchor Pith review Pith/arXiv arXiv 2024

[26] [26]

Nl2repo-bench: Towards long-horizon repository generation evaluation of coding agents, 2025

Jingzhe Ding, Shengda Long, Changxin Pu, Huan Zhou, Hongwan Gao, Xiang Gao, Chao He, Yue Hou, Fei Hu, Zhaojian Li, Weiran Shi, Zaiyuan Wang, Daoguang Zan, Chenchen Zhang, Xiaoxu Zhang, Qizhi Chen, Xianfu Cheng, Bo Deng, Qingshui Gu, Kai Hua, Juntao Lin, Pai Liu, Mingchen Li, Xuanguang Pan, Zifan Peng, Yujia Qin, Yong Shan, Zhewen Tan, Weihao Xie, Zihan Wa...

work page 2025

[27] [27]

Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press

John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. Swe-agent: Agent-computer interfaces enable automated software engineering, 2024

work page 2024

[28] [28]

Fea-bench: A benchmark for evaluating repository-level code generation for feature implementation, 2025

Wei Li, Xin Zhang, Zhongxin Guo, Shaoguang Mao, Wen Luo, Guangyue Peng, Yangyu Huang, Houfeng Wang, and Scarlett Li. Fea-bench: A benchmark for evaluating repository-level code generation for feature implementation, 2025

work page 2025

[29] [29]

Model based testing with labelled transition systems

Jan Tretmans. Model based testing with labelled transition systems. InFormal Methods and Testing, volume 4949 ofLecture Notes in Computer Science, pages 1–38. Springer, 2008

work page 2008

[30] [30]

A brief account of runtime verification.Journal of Logic and Algebraic Programming, 78(5):293–303, 2009

Martin Leucker and Christian Schallhart. A brief account of runtime verification.Journal of Logic and Algebraic Programming, 78(5):293–303, 2009

work page 2009

[31] [31]

An automated verification framework for devs-coupled models using devs-python.Processes, 13(5):1327, 2025

Gyeongmin Lee. An automated verification framework for devs-coupled models using devs-python.Processes, 13(5):1327, 2025

work page 2025

[32] [32]

A unit testing platform to verify devs models

Ignacio Henares, David Ruiz-Martínez, and Eduardo Fernández-Medina. A unit testing platform to verify devs models. InProceedings of the 2020 Winter Simulation Conference (WSC), pages 2707–2718. IEEE, 2020

work page 2020

[33] [33]

Sarjoughian

Jianhua Li and Hessam S. Sarjoughian. A testing framework for devs formalism implementations. InProceedings of the 2011 Winter Simulation Conference (WSC), pages 2735–2746. IEEE, 2011

work page 2011

[34] [34]

Zeigler, James J

Bernard P. Zeigler, James J. Nutaro, and Changhwan Seo. Combining devs and model checking: Concepts and tools for integrating simulation and analysis.International Journal of Simulation and Process Modelling, 12(1):2–15, 2017

work page 2017

[35] [35]

Nutaro, and Hessam S

Mohammad Gholami, James J. Nutaro, and Hessam S. Sarjoughian. Constrained-devs: A formalism for constrain- ing discrete event system specifications. InProceedings of the 2017 Winter Simulation Conference (WSC), pages 1523–1534. IEEE, 2017

work page 2017

[36] [36]

On-the-fly verification of discrete event simulations by means of simulation purposes

Ricardo da Silva, Alexandre de Melo, and Rogério de Lemos. On-the-fly verification of discrete event simulations by means of simulation purposes. InProceedings of the 2011 Symposium on Theory of Modeling & Simulation – DEVS Integrative M&S Symposium, pages 73–80. Society for Modeling & Simulation International, 2011

work page 2011

[37] [37]

McLaughlin and Hessam S

Michael J. McLaughlin and Hessam S. Sarjoughian. Devs-scripting: A black-box test frame for devs models. In Proceedings of the 2020 Winter Simulation Conference (WSC), pages 2666–2677. IEEE, 2020

work page 2020

[38] [38]

Tsong Yueh Chen, S. C. Cheung, and S. M. Yiu. Metamorphic testing: A new approach for generating next test cases.Technical Report HKUST-CS98-01, 1998

work page 1998

[39] [39]

" " 3Defines a specific argument , input port , or output port . 4

Xingyao Wang, Simon Rosenberg, Juan Michelini, Calvin Smith, Hoang Tran, Engel Nyst, Rohit Malhotra, Xuhui Zhou, Valerie Chen, Robert Brennan, and Graham Neubig. The openhands software agent sdk: A composable and extensible foundation for production agents, 2025. 15 Generation and Evaluation of Discrete-Event World Models via the DEVS Formalism A Structur...

work page 2025

[40] [40]

** Separate Concerns **: Isolate the * Core Logic * ( Model ) from the * External C o n t r o l l e r *. - The External C o n t r o l l e r is r e s p o n s i b l e for stdin reading , external event injecting , ar gu me nt s parsing , s i m u l a t i o n control , and writing list [ dict ] logs in logger to stdout / stderr as JSONL . - The Core Model is ...

work page

[41] [41]

** Extract Logic **: You are not writing code , but you MUST extract the specific a l g o r i t h m i c rules ( model structure , math , probabilities , delays ) into the ‘ function ‘ field

work page

[42] [42]

Maintain a balance variable

** Extract Logging **: If any logging / e v e n t _ o u t p u t is required , describe what to log in the ‘ logging ‘ field . - You can add new logging events , but do not change the existing ones . - Unless specified , the output should be co mp let ed by the core model . ## [ Input Data ] ** Target Model Name **: ‘{ name } ‘ ** Raw R e q u i r e m e n t...

work page

[43] [43]

- Only int , float , bool , str , dict , list are allowed

** Type D e s c r i p t i o n s **: - This is the r e q u i r e m e n t s for the ‘ model_init_args ‘ , ‘ input_ports ‘ , and ‘ output_ports ‘. - Only int , float , bool , str , dict , list are allowed

work page

[44] [44]

dict " , st ru ct ur e =

** Complex Type Schema **: - For any input or output port that uses a ’ dict ’ or ’ list ’ type , you MUST e x p l i c i t l y describe the str uc tu re in the s tr uc tu re field . This includes listing all keys , their types , and a brief st ru ctu re for each key . - ** No Gold Plating **: strictly adhere to the data fields sp eci fi ed in the [ Parent...

work page

[45] [45]

In a closed feedback loop ( AâĘŤB ) , co nf igu re Model A to send an initial ’ start ’ signal to Model B at T =0 to kick off the cycle

** Active Trigger **: " In a closed feedback loop ( AâĘŤB ) , co nf igu re Model A to send an initial ’ start ’ signal to Model B at T =0 to kick off the cycle ."

work page

[46] [46]

I n i t i a l i z e the Manager with ‘ credits =N ‘ , allowing it to dispatch N tasks i m m e d i a t e l y without waiting for the first ’ worker_free ’ signals

** Pre - loaded Credit **: " I n i t i a l i z e the Manager with ‘ credits =N ‘ , allowing it to dispatch N tasks i m m e d i a t e l y without waiting for the first ’ worker_free ’ signals ."

work page

[47] [47]

Co nf igu re Workers to send a ’ register ’ event to the Router at T =0 , ensuring the Router has valid d e s t i n a t i o n s p op ula te d before the first packet arrives

** Early R e g i s t r a t i o n **: " Co nf igu re Workers to send a ’ register ’ event to the Router at T =0 , ensuring the Router has valid d e s t i n a t i o n s p op ula te d before the first packet arrives ." - ** Supreme Court Rule **: If [ Parent Model S p e c i f i c a t i o n ] implies a c o n n e c t i o n that creates a cold - start deadlock ...

work page

[48] [48]

models import Atomic , Coupled , Port ‘

Must import : ‘ from xdevs . models import Atomic , Coupled , Port ‘. Atomic for inherit , Coupled for __init__ arg type

work page

[49] [49]

** I n h e r i t a n c e **: Inherit from ‘ Atomic ‘

work page

[50] [50]

General function -

** D ocs tr in g **: The class MUST include a standard class do cs tri ng strictly f oll ow in g this format : ‘‘‘ python class { name }( Atomic ) : \"\"\" Function : - ... General function - ... Every state : how it transfer , and what to output after the state is over . Logging in this model : - ... - ... Input Ports : - p or t_ na me ( type ) : d e s c...

work page

[51] [55]

name ") ) ‘ and ‘ self . a d d _ o u t _ p o r t ( Port ( type ,

Register Ports : Use ‘ self . a d d _ i n _ p o r t ( Port ( type , " name ") ) ‘ and ‘ self . a d d _ o u t _ p o r t ( Port ( type , " name ") ) ‘

work page

[52] [56]

hold_in ( phase , time ) ‘

I n i t i a l i z e State : Set member v ar ia bl es and call ‘ self . hold_in ( phase , time ) ‘

work page

[53] [57]

Log creation : ‘ self . logger . info ({{ keys : values , ...}} , log_type =...) ‘

work page

[54] [58]

S O M E _ S T A T E

** Core Be ha vio rs **: - I mp le men t ‘ i n i t i a l i z e ( self ) ‘: Set initial state . Set phase / sigma using ‘ self . hold_in ( phase , time ) ‘. Log i n i t i a l i z a t i o n . - It can not send any output . If you need to send a initial signal ( e . g . report you are ready ) , you can use ‘ self . hold_in ( phase , time ) ‘ to schedule the ...

work page

[55] [59]

And the keys and s t r u c t u r e s of the logs must match the S p e c i f i c a t i o n exactly

** logging r e q u i r e m e n t s **: make sure all the events required are logged . And the keys and s t r u c t u r e s of the logs must match the S p e c i f i c a t i o n exactly . Coupled Model Instructions (Injected into Main Prompt) ### [ Coupled Model Spe ci fi cs ]

work page

[56] [60]

models import Atomic , Coupled , Port ‘

Must import : ‘ from xdevs . models import Atomic , Coupled , Port ‘

work page

[57] [61]

** I n h e r i t a n c e **: Inherit from ‘ xdevs . models . Coupled ‘

work page

[58] [62]

** D ocs tr in g **: The class MUST include a standard class do cs tri ng strictly f oll ow in g this format : ‘‘‘ python class { name }( Coupled ) : \"\"\" Function : - ... - ... - Sub - models : - s u b _ m o d e l _ c l a s s _ n a m e : name = s u b _ m o d e l _ i n s t a n c e _ n a m e . Brief d e s c r i p t i o n . Logging in this model : 26 Gene...

work page

[59] [63]

I mpl em en t ONLY ‘ __init__ ‘

** C ont ai ne r Logic **: Treat this class as a pure s tru ct ur e c on ta in er . I mpl em en t ONLY ‘ __init__ ‘

work page

[60] [64]

** Sub - models Imports **: Use relative imports for sub - models ( e . g . , ‘ from . folder . file import SubModelName ‘)

work page

[61] [65]

using the fo ll ow in g format : ‘‘‘ python \"\"\" Args : name ( str ) : The unique name of the model

** C o n s t r u c t o r ( ‘ __init__ ‘) **: - S ig na tur e : ‘ def __init__ ( self , name : str , parent : Coupled | None , < e x p l i c i t _ c o n f i g _ a r g s >) ‘ - D oc st rin g : should have a do cst ri ng d e s c r i b i n g the arguments , i nc lu di ng the detailed type and d e s c r i p t i o n . using the fo ll ow in g format : ‘‘‘ python...

work page

[62] [66]

__init__ ( name ) ‘

Call ‘ super () . __init__ ( name ) ‘

work page

[63] [67]

parent = parent ‘

Assign ‘ self . parent = parent ‘

work page

[64] [68]

logger = g e t _ s i m _ l o g g e r ( self ) ‘

I n i t i a l i z e logger : ‘ self . logger = g e t _ s i m _ l o g g e r ( self ) ‘

work page

[65] [69]

a d d _ i n _ p o r t (...) ‘ and ‘ self

Register Ports : Use ‘ self . a d d _ i n _ p o r t (...) ‘ and ‘ self . a d d _ o u t _ p o r t (...) ‘

work page

[66] [70]

a d d _ c o m p o n e n t ( instance ) ‘

I n s t a n t i a t e C o m p o n e n t s : Create sub - model in st an ces and register them via ‘ self . a d d _ c o m p o n e n t ( instance ) ‘

work page

[67] [71]

p or t_n am e

Define Co up lin gs : Use ‘ self . a d d _ c o u p l i n g ( src , dst ) ‘ for : - ** EIC **: ‘ self . input [" p or t_n am e "] ‘ -> ‘ sub . input [" po rt_ na me "] ‘ - ** IC **: ‘ sub_a . output [" p ort _n am e "] ‘ -> ‘ sub_b . input [" p or t_ na me "] ‘ - ** EOC **: ‘ sub . output [" p or t_n am e "] ‘ -> ‘ self . output [" por t_ na me "] ‘

work page

[68] [72]

Ground Truth

Log creation : ‘ self . logger . info (...) ‘ - Note : For steps 5 -6 , you should refer to Sub - Models to get the right init args names and port names . These i n f o r m a t i o n can be used as a c o r r e c t i o n and s u p p l e m e n t to the coupling logic ( in case some names are i n c o n s i s t e n t ) . B.3 Interface Adaptation Agent TheMode...

work page

[69] [73]

The goal is to transmit a sequence of packets reliably using an A l t e r n a t i n g Bit Protocol ( ABP ) despite d e t e r m i n i s t i c packet loss in the channels

System O bj ec ti ve : Design a c o m m u n i c a t i o n system c o n s i s t i n g of a Sender , a Receiver , and two uni - d i r e c t i o n a l t r a n s m i s s i o n channels ( Subnets ) . The goal is to transmit a sequence of packets reliably using an A l t e r n a t i n g Bit Protocol ( ABP ) despite d e t e r m i n i s t i c packet loss in the ch...

work page

[70] [74]

noise level

Entity B eh av io rs : 5The Sender : 6- Accepts a single control input at the start of s i m u l a t i o n : the total number of packets to send . 7- Before sending each packet , the Sender must undergo a p r e p a r a t i o n delay ( default 10 ms , c o n f i g u r a b l e via -- s e n d e r _ d e l a y ) . 8- The Receiver must maintain a buffer with cap...

work page

[71] [75]

33- System starts at time 0.0 with all c o m p o n e n t s i n i t i a l i z e d to idle states

Scenario C o n s t r a i n t s : 32- Time Unit Mapping : 1.0 s i m u l a t i o n time unit = 1 M i l l i s e c o n d ( ms ) . 33- System starts at time 0.0 with all c o m p o n e n t s i n i t i a l i z e d to idle states . Listing 2: Natural Language Specification (S)

work page

[72] [76]

4* ‘-- seed ‘ ( int ) : The i n i t i a l i z a t i o n seed for the noise g en er at or of both sides ( the ‘x ‘ value in the LCG formula )

Command Line Ar gu me nt s : 2The script must accept the f ol low in g named a rg ume nt s : 3* ‘-- total_packets ‘ ( int ) : The total number of packets the Sender intends to send in one session t ri gg er ed by a S T A R T _ B A T C H command . 4* ‘-- seed ‘ ( int ) : The i n i t i a l i z a t i o n seed for the noise g en er at or of both sides ( the ‘...

work page

[73] [77]

The system uses command line a rgu me nt s for c o n f i g u r a t i o n

stdin Format : 12* No stdin input is required for this s i m u l a t i o n . The system uses command line a rgu me nt s for c o n f i g u r a t i o n . 13

work page

[74] [78]

time ": < float > ,

** Standard Output ( stdout ) **: 15* Format : JSONL , one i n d e p e n d e n t JSON object per line 16* Each record MUST follow the format : ‘{" time ": < float > , " entity ": < str > , " event ": < str > , " payload ": < dict >} ‘ 17* ** Event Types and Formats **: 18Sender Events : 19- event : ‘ delay_start ‘ ( Sender starts p r e p a r a t i o n del...

work page