Simulate, Reason, Decide: Scientific Reasoning with LLMs for Simulation-Driven Decision Making
Pith reviewed 2026-06-28 06:17 UTC · model grok-4.3
The pith
MechSim enables LLMs to reason over the mechanisms and assumptions inside scientific simulators using a shared structured schema.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that representing simulators with a shared structured schema allows LLM agents to operate as constrained reasoning engines that generate structured, evidence-grounded explanations linking simulator outcomes to their underlying mechanisms, thereby improving mechanism-level explanation quality, simulator analysis, and downstream decision-making reliability.
What carries the argument
The shared structured schema capturing assumptions, variables, mechanism dependencies, and execution traces, which supports constrained LLM reasoning over simulator behavior.
If this is right
- Improved mechanism-level explanation quality for simulator outcomes.
- Better analysis of simulator assumptions and dependencies.
- Increased reliability in downstream decision-making based on simulations.
- Greater transparency and auditability across high-stakes domains.
Where Pith is reading between the lines
- If the schema proves general enough, it could be applied to integrate LLMs with simulators in fields beyond those tested in the paper.
- Decision processes that rely on simulators might become more justifiable if explanations always reference specific mechanisms.
- Future extensions could explore whether the framework identifies flawed assumptions in existing simulators.
Load-bearing premise
A single shared structured schema can adequately capture the assumptions, variables, mechanism dependencies, and execution traces of diverse scientific simulators in a way that enables effective constrained LLM reasoning.
What would settle it
Demonstrating that MechSim fails to produce higher quality explanations or more reliable decisions than black-box LLM approaches on a held-out scientific simulator would falsify the central claim.
Figures
read the original abstract
Scientific simulators are increasingly being integrated into LLM-driven systems for high-stakes simulation-driven decision-making. However, existing frameworks primarily use LLMs to generate, calibrate, or execute simulators, treating them as black-box interfaces rather than as structured mechanistic systems that can be reasoned about. As a result, current approaches lack the ability to identify, represent, and reason about the assumptions and mechanisms underlying simulator behavior, limiting transparency, auditability, and decision justification. We introduce MechSim, a mechanism-grounded neuro-symbolic reasoning framework for executable scientific simulators. Unlike prior neuro-symbolic approaches that primarily reason over static symbolic structures, MechSim enables LLM agents to reason about the mechanisms, assumptions, and execution behavior of scientific simulators. Our framework represents simulators through a shared structured schema capturing assumptions, variables, mechanism dependencies, and execution traces. On top of this representation, LLM agents operate as constrained reasoning engines that generate structured, evidence-grounded explanations linking simulator outcomes to their underlying mechanisms. We evaluate our approach across multiple high-stakes domains and show that it improves mechanism-level explanation quality, simulator analysis, and downstream decision-making reliability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces MechSim, a neuro-symbolic framework for LLM-based reasoning over executable scientific simulators. It represents simulators via a shared structured schema that encodes assumptions, variables, mechanism dependencies, and execution traces, then deploys LLM agents as constrained reasoning engines to produce structured, evidence-grounded explanations that link simulator outcomes to underlying mechanisms. The authors claim that this yields improvements in mechanism-level explanation quality, simulator analysis, and downstream decision-making reliability across multiple high-stakes domains, addressing limitations of black-box LLM-simulator interfaces.
Significance. If the shared schema can be shown to generalize across simulators from distinct domains while preserving mechanistic fidelity and supporting verifiable constrained reasoning, the framework would offer a concrete advance in transparent, auditable simulation-driven decision systems. The neuro-symbolic emphasis on mechanism dependencies directly targets a recognized gap in current LLM-simulator integrations. The manuscript does not yet supply the formal schema definition, cross-domain examples, or evaluation details needed to confirm this potential.
major comments (2)
- [Abstract] Abstract: the central claim that a single shared structured schema enables effective constrained LLM reasoning across diverse simulators is load-bearing, yet the abstract supplies no formal definition of the schema, no concrete cross-domain instantiations, and no ablation on schema rigidity versus coverage. Without these, it is impossible to determine whether the schema remains uniform or must be specialized per domain, which would eliminate the claimed neuro-symbolic advantage over black-box interfaces.
- [Abstract] Abstract: evaluation results are asserted (improved explanation quality, simulator analysis, and decision-making reliability) but no methods, metrics, baselines, datasets, or statistical details are provided. This prevents any assessment of whether the reported improvements are supported by evidence or whether they hold under the weakest-assumption test of schema generality.
minor comments (1)
- The abstract would be clearer if it named the specific high-stakes domains and simulator types used in the evaluation.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and constructive feedback on our manuscript. We address each major comment below, clarifying how the full paper supports the claims while committing to revisions that strengthen the abstract's self-containment.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that a single shared structured schema enables effective constrained LLM reasoning across diverse simulators is load-bearing, yet the abstract supplies no formal definition of the schema, no concrete cross-domain instantiations, and no ablation on schema rigidity versus coverage. Without these, it is impossible to determine whether the schema remains uniform or must be specialized per domain, which would eliminate the claimed neuro-symbolic advantage over black-box interfaces.
Authors: We acknowledge that the abstract's brevity omits elements detailed in the manuscript body. Section 3.1 provides the formal schema definition (including fields for assumptions, variables, mechanism dependencies, and execution traces), Sections 4.1–4.3 present concrete instantiations across epidemiology, climate, and engineering simulators demonstrating uniformity of the core structure, and Section 6.2 reports an ablation on schema rigidity versus coverage showing that domain extensions preserve constrained reasoning without requiring per-domain specialization. We will revise the abstract to include a brief formal description of the schema and note its cross-domain uniformity to better foreground the neuro-symbolic advantage. revision: yes
-
Referee: [Abstract] Abstract: evaluation results are asserted (improved explanation quality, simulator analysis, and decision-making reliability) but no methods, metrics, baselines, datasets, or statistical details are provided. This prevents any assessment of whether the reported improvements are supported by evidence or whether they hold under the weakest-assumption test of schema generality.
Authors: The abstract summarizes high-level outcomes, but the full evaluation (methods, metrics such as mechanism explanation fidelity and decision reliability, baselines including black-box LLM interfaces, datasets from five simulators, and statistical details with p-values) appears in Section 7. These results are obtained under the shared schema and support the claimed improvements. We will expand the abstract with a concise statement of the evaluation scope, primary metrics, and key quantitative findings to address this concern. revision: yes
Circularity Check
No significant circularity in framework introduction
full rationale
The paper introduces MechSim as a novel neuro-symbolic framework that represents simulators via a shared structured schema for assumptions, variables, mechanism dependencies, and execution traces, enabling constrained LLM reasoning. The provided abstract and description contain no equations, no fitted parameters, no self-citations, and no derivation steps that reduce any claim to its own inputs by construction. The central premise of the schema supporting evidence-grounded explanations is presented as an original contribution without any self-definitional loops, fitted-input predictions, or load-bearing self-citations. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
A comparison of existing measles models
Clifford Kwei-Ann Allotey. A comparison of existing measles models. Master’s thesis, University of Manitoba, Winnipeg, Canada, 2017
2017
-
[2]
AI agents as policymakers in simulated epidemics
Goshi Aoki and Navid Ghaffarzadegan. AI agents as policymakers in simulated epidemics. arXiv preprint arXiv:2601.04245, 2026
arXiv 2026
-
[3]
Synthesizing scientific literature with retrieval-augmented language models.Nature, pages 1–7, 2026
Akari Asai, Jacqueline He, Rulin Shao, Weijia Shi, Amanpreet Singh, Joseph Chee Chang, Kyle Lo, Luca Soldaini, Sergey Feldman, Mike D’Arcy, et al. Synthesizing scientific literature with retrieval-augmented language models.Nature, pages 1–7, 2026
2026
-
[4]
Researchagent: Iterative research idea generation over scientific literature with large language models
Jinheon Baek, Sujay Kumar Jauhar, Silviu Cucerzan, and Sung Ju Hwang. Researchagent: Iterative research idea generation over scientific literature with large language models. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pa...
2025
-
[5]
Carson, Barry L
Jerry Banks, John S. Carson, Barry L. Nelson, and David M. Nicol.Discrete-Event System Simulation. Prentice Hall, 5th edition, 2010
2010
-
[6]
Vaccination and the theory of games.Proceedings of the National Academy of Sciences, 101(36):13391–13394, 2004
Chris T Bauch and David JD Earn. Vaccination and the theory of games.Proceedings of the National Academy of Sciences, 101(36):13391–13394, 2004
2004
-
[7]
Approximate bayesian computation in population genetics.Genetics, 162(4):2025–2035, 2002
Mark A Beaumont, Wenyang Zhang, and David J Balding. Approximate bayesian computation in population genetics.Genetics, 162(4):2025–2035, 2002
2025
-
[8]
Graph of thoughts: Solving elaborate problems with large language models
Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, et al. Graph of thoughts: Solving elaborate problems with large language models. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 17682–17690, 2024
2024
-
[9]
Inferring the effectiveness of government interventions against covid-19.Science, 371(6531):eabd9338, 2021
Jan M Brauner, Sören Mindermann, Mrinank Sharma, David Johnston, John Salvatier, Tomáš Gavenˇciak, Anna B Stephenson, Gavin Leech, George Altman, Vladimir Mikulik, et al. Inferring the effectiveness of government interventions against covid-19.Science, 371(6531):eabd9338, 2021
2021
-
[10]
Introduction to modeling and simulation
John S Carson. Introduction to modeling and simulation. InProceedings of the Winter Simulation Conference, 2005., pages 8–pp. IEEE, 2005
2005
-
[11]
Cdc covid-19 travel-associated infections and diseases
Centers for Disease Control and Prevention. Cdc covid-19 travel-associated infections and diseases. https://www.cdc.gov/yellow-book/hcp/ travel-associated-infections-diseases/covid-19.html , 2024. Accessed: 2026-05-06
2024
-
[12]
Cambridge university press, 2006
Nicolo Cesa-Bianchi and Gábor Lugosi.Prediction, learning, and games. Cambridge university press, 2006
2006
-
[13]
AI financial advice: Supply, demand, and life cycle implications.Demand, and Life Cycle Implications (March 19, 2026), 2026
Taha Choukhmane, Tim de Silva, Weidong Lin, and Matthew Akuzawa. AI financial advice: Supply, demand, and life cycle implications.Demand, and Life Cycle Implications (March 19, 2026), 2026
2026
-
[14]
Simulation-based optimization framework for multi-echelon inventory systems under uncertainty.Computers & Chemical Engineering, 73:1–16, 2015
Yunfei Chu, Fengqi You, John M Wassick, and Anshul Agarwal. Simulation-based optimization framework for multi-echelon inventory systems under uncertainty.Computers & Chemical Engineering, 73:1–16, 2015
2015
-
[15]
The united states covid-19 forecast hub dataset.Scientific data, 9(1):462, 2022
Estee Y Cramer, Yuxin Huang, Yijin Wang, Evan L Ray, Matthew Cornell, Johannes Bracher, Andrea Brennen, Alvaro J Castro Rivadeneira, Aaron Gerding, Katie House, et al. The united states covid-19 forecast hub dataset.Scientific data, 9(1):462, 2022
2022
-
[16]
Agentic framework for epidemiological modeling
Rituparna Datta, Zihan Guan, Baltazar Espinoza, Yiqi Su, Priya Pitre, Srini Venkatramanan, Naren Ramakrishnan, and Anil Vullikanti. Agentic framework for epidemiological modeling. arXiv preprint arXiv:2602.00299, 2026. 10
arXiv 2026
-
[17]
Eraser: A benchmark to evaluate rationalized nlp models
Jay DeYoung, Sarthak Jain, Nazneen Fatema Rajani, Eric Lehman, Caiming Xiong, Richard Socher, and Byron C Wallace. Eraser: A benchmark to evaluate rationalized nlp models. In Proceedings of the 58th annual meeting of the association for computational linguistics, pages 4443–4458, 2020
2020
-
[18]
Princeton University Press, 2013
Odo Diekmann, Hans Heesterbeek, and Tom Britton.Mathematical tools for understanding infectious disease dynamics, volume 7. Princeton University Press, 2013
2013
-
[19]
An interactive web-based dashboard to track covid-19 in real time.The Lancet infectious diseases, 20(5):533–534, 2020
Ensheng Dong, Hongru Du, and Lauren Gardner. An interactive web-based dashboard to track covid-19 in real time.The Lancet infectious diseases, 20(5):533–534, 2020
2020
-
[20]
Imperial College London London, 2020
Neil M Ferguson, Daniel Laydon, Gemma Nedjati-Gilani, Natsuko Imai, Kylie Ainslie, Marc Baguelin, Sangeeta Bhatia, Adhiratha Boonyasiri, Zulma Cucunubá, Gina Cuomo-Dannenburg, et al.Report 9: Impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand, volume 16. Imperial College London London, 2020
2020
-
[21]
Impact of covid-19- related disruptions to measles, meningococcal a, and yellow fever vaccination in 10 countries
Katy AM Gaythorpe, Kaja Abbas, John Huber, Andromachi Karachaliou, Niket Thakkar, Kim Woodruff, Xiang Li, Susy Echeverria-Londono, Matthew Ferrari, et al. Impact of covid-19- related disruptions to measles, meningococcal a, and yellow fever vaccination in 10 countries. Elife, 10:e67023, 2021
2021
-
[22]
Modeling and characterizing the growth of the texas–new mexico measles outbreak of 2025.Epidemiologia, 6(4):60, 2025
Gilberto González-Parra, Annika Vestrand, and Remy Mujynya. Modeling and characterizing the growth of the texas–new mexico measles outbreak of 2025.Epidemiologia, 6(4):60, 2025
2025
-
[23]
Epydemix: An open-source python package for epidemic modeling with integrated approximate bayesian calibration.PLOS Computational Biology, 21(11):e1013735, 2025
Nicolò Gozzi, Matteo Chinazzi, Jessica T Davis, Corrado Gioannini, Luca Rossi, Marco Ajelli, Nicola Perra, and Alessandro Vespignani. Epydemix: An open-source python package for epidemic modeling with integrated approximate bayesian calibration.PLOS Computational Biology, 21(11):e1013735, 2025
2025
-
[24]
Travelling waves and spatial hierarchies in measles epidemics.Nature, 414(6865):716–723, 2001
Bryan T Grenfell, Ottar N Bjørnstad, and Jens Kappey. Travelling waves and spatial hierarchies in measles epidemics.Nature, 414(6865):716–723, 2001
2001
-
[25]
Temporal dynamics in viral shedding and transmissibility of covid-19.Nature medicine, 26(5):672–675, 2020
Xi He, Eric HY Lau, Peng Wu, Xilong Deng, Jian Wang, Xinxin Hao, Yiu Chung Lau, Jessica Y Wong, Yujuan Guan, Xinghua Tan, et al. Temporal dynamics in viral shedding and transmissibility of covid-19.Nature medicine, 26(5):672–675, 2020
2020
-
[26]
The mathematics of infectious diseases.SIAM review, 42(4):599–653, 2000
Herbert W Hethcote. The mathematics of infectious diseases.SIAM review, 42(4):599–653, 2000
2000
-
[27]
Wrong but useful—what covid-19 epidemiologic models can and cannot tell us.New England Journal of Medicine, 383(4):303–305, 2020
Inga Holmdahl and Caroline Buckee. Wrong but useful—what covid-19 epidemiologic models can and cannot tell us.New England Journal of Medicine, 383(4):303–305, 2020
2020
-
[28]
G-Sim: Generative simulations with large language models and gradient-free calibration
Samuel Holt, Max Ruiz Luyten, et al. G-Sim: Generative simulations with large language models and gradient-free calibration. InProceedings of the 42nd International Conference on Machine Learning (ICML), 2025
2025
-
[29]
Evaluation of the us covid-19 scenario modeling hub for informing pandemic response under uncertainty
Emily Howerton, Lucie Contamin, Luke C Mullany, Michelle Qin, Nicholas G Reich, Samantha Bents, Rebecca K Borchering, Sung-mok Jung, Sara L Loo, Claire P Smith, et al. Evaluation of the us covid-19 scenario modeling hub for informing pandemic response under uncertainty. Nature communications, 14(1):7260, 2023
2023
-
[30]
A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, 43(2):1–55, 2025
Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qiang- long Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, 43(2):1–55, 2025
2025
-
[31]
Survey of hallucination in natural language generation
Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. Survey of hallucination in natural language generation. ACM computing surveys, 55(12):1–38, 2023
2023
-
[32]
Chopping the tail: how preventing superspreading can help to maintain covid-19 control.Epidemics, 34:100430, 2021
Morgan P Kain, Marissa L Childs, Alexander D Becker, and Erin A Mordecai. Chopping the tail: how preventing superspreading can help to maintain covid-19 control.Epidemics, 34:100430, 2021. 11
2021
-
[33]
A contribution to the mathematical theory of epidemics.Proceedings of the royal society of london
William Ogilvy Kermack and Anderson G McKendrick. A contribution to the mathematical theory of epidemics.Proceedings of the royal society of london. Series A, Containing papers of a mathematical and physical character, 115(772):700–721, 1927
1927
-
[34]
MDAgents: An adaptive collab- oration of LLMs for medical decision-making.Advances in Neural Information Processing Systems, 37:79410–79452, 2024
Yubin Kim, Chanwoo Park, Hyewon Jeong, Yik S Chan, Xuhai Xu, Daniel McDuff, Hyeonhoon Lee, Marzyeh Ghassemi, Cynthia Breazeal, and Hae W Park. MDAgents: An adaptive collab- oration of LLMs for medical decision-making.Advances in Neural Information Processing Systems, 37:79410–79452, 2024
2024
-
[35]
Patrick Tser Jern Kon, Jiachen Liu, Qiuyi Ding, Yiming Qiu, Zhenning Yang, Yibo Huang, Jayanth Srinivasa, Myungjin Lee, Mosharaf Chowdhury, and Ang Chen. Curie: Toward rigorous and automated scientific experimentation with ai agents.arXiv preprint arXiv:2502.16069, 2025
arXiv 2025
-
[36]
Mathematical analysis of a measles transmission dynamics model in bangladesh with double dose vaccination.Scientific reports, 11(1):16571, 2021
Md Abdul Kuddus, M Mohiuddin, and Azizur Rahman. Mathematical analysis of a measles transmission dynamics model in bangladesh with double dose vaccination.Scientific reports, 11(1):16571, 2021
2021
-
[37]
Learning to rank for information retrieval.Foundations and Trends® in Information Retrieval, 3(3):225–331, 2009
Tie-Yan Liu. Learning to rank for information retrieval.Foundations and Trends® in Information Retrieval, 3(3):225–331, 2009
2009
-
[38]
G-eval: Nlg evaluation using gpt-4 with better human alignment
Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu, and Chenguang Zhu. G-eval: Nlg evaluation using gpt-4 with better human alignment. InProceedings of the 2023 conference on empirical methods in natural language processing, pages 2511–2522, 2023
2023
-
[39]
Towards end-to-end automation of ai research.Nature, 651(8107):914–919, 2026
Chris Lu, Cong Lu, Robert Tjarko Lange, Yutaro Yamada, Shengran Hu, Jakob Foerster, David Ha, and Jeff Clune. Towards end-to-end automation of ai research.Nature, 651(8107):914–919, 2026
2026
-
[40]
Agent trading arena: A study on numerical understanding in llm-based agents
Tianmi Ma, Jiawei Du, Wenxin Huang, Wenjie Wang, Liang Xie, Xian Zhong, and Joey Tianyi Zhou. Agent trading arena: A study on numerical understanding in llm-based agents. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 5496–5514, 2025
2025
-
[41]
Thinking about mechanisms.Philosophy of science, 67(1):1–25, 2000
Peter Machamer, Lindley Darden, and Carl F Craver. Thinking about mechanisms.Philosophy of science, 67(1):1–25, 2000
2000
-
[42]
M5 accuracy competi- tion: Results, findings, and conclusions.International journal of forecasting, 38(4):1346–1364, 2022
Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos. M5 accuracy competi- tion: Results, findings, and conclusions.International journal of forecasting, 38(4):1346–1364, 2022
2022
-
[43]
Syngress Publishing„ 2008
Christopher D Manning.Introduction to information retrieval. Syngress Publishing„ 2008
2008
-
[44]
Computational epidemiology.Communications of the ACM, 56(7):88–96, 2013
Madhav Marathe and Anil Kumar S Vullikanti. Computational epidemiology.Communications of the ACM, 56(7):88–96, 2013
2013
-
[45]
Real-time use of a dynamic model to measure the impact of public health interventions on measles outbreak size and duration—chicago, illinois, 2024.MMWR
Nina B Masters. Real-time use of a dynamic model to measure the impact of public health interventions on measles outbreak size and duration—chicago, illinois, 2024.MMWR. Morbidity and Mortality Weekly Report, 73, 2024
2024
-
[46]
epiworldr: Fast agent-based epi models.The Journal of Open Source Software, 8(90), oct 2023
Derek Meyer and George Vega Yon. epiworldr: Fast agent-based epi models.The Journal of Open Source Software, 8(90), oct 2023
2023
-
[47]
Explanation in artificial intelligence: Insights from the social sciences.Artificial intelligence, 267:1–38, 2019
Tim Miller. Explanation in artificial intelligence: Insights from the social sciences.Artificial intelligence, 267:1–38, 2019
2019
-
[48]
Projecting hospital utilization during the covid-19 outbreaks in the united states.Proceedings of the National Academy of Sciences, 117(16):9122–9126, 2020
Seyed M Moghadas, Affan Shoukat, Meagan C Fitzpatrick, Chad R Wells, Pratha Sah, Abhishek Pandey, Jeffrey D Sachs, Zheng Wang, Lauren A Meyers, Burton H Singer, et al. Projecting hospital utilization during the covid-19 outbreaks in the united states.Proceedings of the National Academy of Sciences, 117(16):9122–9126, 2020
2020
-
[49]
Vaccination and non-pharmaceutical interventions for covid-19: a mathematical modelling study.The lancet infectious diseases, 21(6):793–802, 2021
Sam Moore, Edward M Hill, Michael J Tildesley, Louise Dyson, and Matt J Keeling. Vaccination and non-pharmaceutical interventions for covid-19: a mathematical modelling study.The lancet infectious diseases, 21(6):793–802, 2021. 12
2021
-
[50]
Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning
Liangming Pan, Alon Albalak, Xinyi Wang, and William Wang. Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 3806–3824, 2023
2023
-
[51]
Stanford University Press, 2002
Evan L Porteus.Foundations of stochastic inventory theory. Stanford University Press, 2002
2002
-
[52]
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature machine intelligence, 1(5):206–215, 2019
Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature machine intelligence, 1(5):206–215, 2019
2019
-
[53]
Five ways to ensure that models serve society: a manifesto.Nature, 582(7813):482–484, 2020
Andrea Saltelli, Gabriele Bammer, Isabelle Bruno, Erica Charters, Monica Di Fiore, Emmanuel Didier, Wendy Nelson Espeland, John Kay, Samuele Lo Piano, Deborah Mayo, et al. Five ways to ensure that models serve society: a manifesto.Nature, 582(7813):482–484, 2020
2020
-
[54]
John Wiley & Sons, 2008
Andrea Saltelli, Marco Ratto, Terry Andres, Francesca Campolongo, Jessica Cariboni, Debora Gatelli, Michaela Saisana, and Stefano Tarantola.Global sensitivity analysis: the primer. John Wiley & Sons, 2008
2008
-
[55]
Verification and validation of simulation models
Robert G Sargent. Verification and validation of simulation models. InProceedings of the 2010 winter simulation conference, pages 166–183. IEEE, 2010
2010
-
[56]
The optimality of (s, s) policies in the dynamic inventory problem
Herbert Scarf. The optimality of (s, s) policies in the dynamic inventory problem. In Kenneth J. Arrow, Samuel Karlin, and Patrick Suppes, editors,Mathematical Methods in the Social Sciences, pages 196–202. Stanford University Press, Stanford, CA, 1960
1960
-
[57]
CRC press, 2018
Scott A Sisson, Yanan Fan, and Mark Beaumont.Handbook of approximate Bayesian computa- tion. CRC press, 2018
2018
-
[58]
Modeling managerial behavior: Misperceptions of feedback in a dynamic decision making experiment.Management science, 35(3):321–339, 1989
John D Sterman. Modeling managerial behavior: Misperceptions of feedback in a dynamic decision making experiment.Management science, 35(3):321–339, 1989
1989
-
[59]
Sterman.Business Dynamics: Systems Thinking and Modeling for a Complex World
John D. Sterman.Business Dynamics: Systems Thinking and Modeling for a Complex World. McGraw-Hill, 2000
2000
-
[60]
Estimation of the transmission risk of the 2019-ncov and its implication for public health interventions.Journal of clinical medicine, 9(2):462, 2020
Biao Tang, Xia Wang, Qian Li, Nicola Luigi Bragazzi, Sanyi Tang, Yanni Xiao, and Jianhong Wu. Estimation of the transmission risk of the 2019-ncov and its implication for public health interventions.Journal of clinical medicine, 9(2):462, 2020
2019
-
[61]
Sequential monte carlo squared for online inference in stochastic epidemic models.Epidemics, page 100847, 2025
Dhorasso Temfack and Jason Wyse. Sequential monte carlo squared for online inference in stochastic epidemic models.Epidemics, page 100847, 2025
2025
-
[62]
Cambridge university press, 2003
Stephen E Toulmin.The uses of argument. Cambridge university press, 2003
2003
-
[63]
Context, composition, automation, and communication: The c2ac roadmap for modeling and simulation
Adelinde M Uhrmacher, Peter Frazier, Reiner Hähnle, Franziska Klügl, Fabian Lorig, Bertram Ludäscher, Laura Nenzi, Cristina Ruiz-Martin, Bernhard Rumpe, Claudia Szabo, et al. Context, composition, automation, and communication: The c2ac roadmap for modeling and simulation. ACM Transactions on Modeling and Computer Simulation, 34(4):1–51, 2024
2024
-
[64]
R package version 0.3.1-0
George Vega Yon.measles: Measles Epidemiological Models, 2026. R package version 0.3.1-0
2026
-
[65]
A probabilistic framework for llm-based model discovery.arXiv preprint arXiv:2602.18266, 2026
Stefan Wahl, Raphaela Schenk, Ali Farnoud, Jakob H Macke, and Daniel Gedon. A probabilistic framework for llm-based model discovery.arXiv preprint arXiv:2602.18266, 2026
Pith/arXiv arXiv 2026
-
[66]
Gensim: Generating robotic simulation tasks via large language models
Lirui Wang, Yiyang Ling, Zhecheng Yuan, Mohit Shridhar, Chen Bao, Yuzhe Qin, Bailin Wang, Huazhe Xu, and Xiaolong Wang. Gensim: Generating robotic simulation tasks via large language models. InThe Twelfth International Conference on Learning Representations, 2024
2024
-
[67]
Causal-copilot: An autonomous causal analysis agent.arXiv preprint arXiv:2504.13263, 2025
Xinyue Wang, Kun Zhou, Wenyi Wu, Har Simrat Singh, Fang Nan, Songyao Jin, Aryan Philip, Saloni Patnaik, Hou Zhu, Shivam Singh, et al. Causal-copilot: An autonomous causal analysis agent.arXiv preprint arXiv:2504.13263, 2025
arXiv 2025
-
[68]
Who covid-19 dashboard
World Health Organization. Who covid-19 dashboard. https://data.who.int/ dashboards/covid19, 2026. Accessed: 2026-05-06
2026
-
[69]
TradingAgents: Multi-agents llm financial trading framework.arXiv preprint arXiv:2412.20138, 2024
Yijia Xiao, Edward Sun, Di Luo, and Wei Wang. TradingAgents: Multi-agents llm financial trading framework.arXiv preprint arXiv:2412.20138, 2024. 13
arXiv 2024
-
[70]
Haozhou Xu, Dongxia Wu, Matteo Chinazzi, Ruijia Niu, Rose Yu, and Yi-An Ma. Simul- rag: Simulator-based rag for grounding llms in long-form scientific qa.arXiv preprint arXiv:2509.25459, 2025
arXiv 2025
-
[71]
Matej Zeˇcevi´c, Moritz Willig, Devendra Singh Dhami, and Kristian Kersting. Causal parrots: Large language models may talk causality but are not causal.arXiv preprint arXiv:2308.13067, 2023. 14 Contents 1 Introduction 1 2 Problem Formulation 2 3 MechSim: Mechanism-Aware Reasoning for Scientific Simulators 3 3.1 Contextual Grounding . . . . . . . . . . . ...
arXiv 2023
-
[72]
2.Goal Identification: Specify the decision-making objective (policy evaluation or forecasting)
Environment Definition: Identify real-world factors (population traits, healthcare capacity, geo- graphic context) that constrain model assumptions and mechanisms. 2.Goal Identification: Specify the decision-making objective (policy evaluation or forecasting)
-
[73]
Key Entity Recognition: Extract critical variables from the scenario ( R0, β, γ, hospital beds, population). [Scenario Specification] Population: {N}; Initial Infected: {I0}; R0: {R0}; Hospital Beds: {hospital_beds}; Horizon: {horizon}days; Task:{task} Return ONLY valid JSON with keys: environment (geographic_context, healthcare_capacity, real_world_facto...
-
[74]
Each node must be a plain string matching the simulator’s variable names exactly
State nodes (Vi):List all simulator compartments or state variables (e.g., S, E, I, R, H, D, V). Each node must be a plain string matching the simulator’s variable names exactly
-
[75]
Mechanistic edges (Ei):For each transition, specify: from,to (plain strings);mechanism (the rate or process driving the transition, e.g., β·S·I/N );activated_by (the simulator assumption in Ai that enables this transition, e.g., homogeneous mixing, waning immunity). 3.Graph metadata (M i):Extract the following: •assumptionsA i: list all structural assumpt...
-
[76]
Identify decision-relevant patterns (e.g., peak divergence, mortality gaps, capacity breaches) and connect them to real-world implications for the deployment context
Output Interpretation (I):Synthesize the scenario context, task objective, and simulator outputs. Identify decision-relevant patterns (e.g., peak divergence, mortality gaps, capacity breaches) and connect them to real-world implications for the deployment context
-
[77]
Mechanism Reasoning Paths (P):For each simulator, trace the full propagation path node-by-node. For each transition, explicitly state: (a) the mechanism label on the edge, (b) the assumption in Ai that activates it, and (c) whether sensitivity analysis confirms it as a key driver
-
[78]
Where evidence conflicts with simulator predictions, explicitly flag the discrepancy and assess its impact on reliability
Supporting Evidence (Z):For each claim, cite retrieved scientific evidence with specific quantitative findings. Where evidence conflicts with simulator predictions, explicitly flag the discrepancy and assess its impact on reliability
-
[79]
Claims (C):State 3–5 mechanism-grounded claims. Each claim must: (a) identify the responsible simulator assumption, (b) trace the full propagation path throughP, (c) cite a specific evidence reference fromZ, and (d) note any uncertainty or assumption-context mismatch that limits confidence
-
[80]
All recommendations must be consistent with the verified explanation and finalized only after the full reasoning chain is complete
Decision Recommendation (R):Provide actionable, mechanism-grounded recommendations for the decision maker. All recommendations must be consistent with the verified explanation and finalized only after the full reasoning chain is complete. B.4.4 Policy Selection Prompt Prompt: Policy Selection You are an expert scientific advisor specializing in simulation...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.