GenWorld: Empirically Grounded Urban Simulation Infrastructure for Scalable LLM-Agent Studies

Gen Li; Jieyuan Lan; Masaki Ogura; Pengcheng Xu; Tao Feng; Zongyuan Wu

arxiv: 2606.27650 · v1 · pith:JBETXDNVnew · submitted 2026-06-26 · 💻 cs.MA

GenWorld: Empirically Grounded Urban Simulation Infrastructure for Scalable LLM-Agent Studies

Gen Li , Jieyuan Lan , Pengcheng Xu , Zongyuan Wu , Masaki Ogura , Tao Feng This is my paper

Pith reviewed 2026-06-29 02:51 UTC · model grok-4.3

classification 💻 cs.MA

keywords LLM agentsurban simulationsynthetic citymulti-agent systemspolicy compilationgrounded simulationagent-environment interfacescalable rollout

0 comments

The pith

GenWorld combines a building-level synthetic city with offline LLM policy compilation to enable scalable urban agent simulations grounded in real census and mobility data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the dual challenge of making LLM agents behave realistically in city environments while handling populations too large for constant live model calls. It introduces GenWorld as an infrastructure that builds a synthetic city from geospatial and census records, defines a structured interface between agents and that city, and converts LLM-generated decisions into reusable lookup tables. A concrete version for Higashihiroshima places 196608 residents in buildings, checks consistency with official statistics, and cross-checks travel distances against mobile-phone records. Three example runs show the system supporting full-city weekday simulations, weekday-versus-weekend differences, and auditable responses to warnings. The authors position the platform as a reproducible base for future grounded studies rather than a finished forecasting tool.

Core claim

GenWorld supplies an empirically grounded urban simulation infrastructure that merges a building-level synthetic city, a structured agent-environment interface, and offline compilation of LLM-derived decision signals into lookup policies, allowing scalable rollout; the reference implementation for Higashihiroshima anchors 196608 synthetic residents in census and geospatial data, validates demographic consistency, and uses YJMob100K data as a commuting-distance check, with demonstrations of full-city weekday rollouts, weekday-weekend contrasts, and warning-response perturbations.

What carries the argument

Offline compilation of LLM-derived decision signals into lookup policies, which replaces repeated online calls with fast table lookups while retaining signals from the original model outputs.

If this is right

A full-city weekday simulation becomes computationally feasible for hundreds of thousands of agents.
Weekday-weekend behavioral contrasts can be generated reproducibly from the same infrastructure.
Perturbation experiments such as warning responses can include auditable replanning traces.
Demographic consistency checks against census tabulations can be repeated for new cities.
Mobile-phone data can serve as an external diagnostic for commuting distances.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same compilation approach could be tested on other cities if equivalent census and mobility datasets exist.
If the lookup policies prove stable, the framework might later support controlled experiments on policy interventions such as evacuation routes.
Calibration against observed traffic or evacuation outcomes would be needed before treating outputs as forecasts.
The structured interface might allow swapping in different agent decision models without rebuilding the city layer.

Load-bearing premise

Offline lookup policies compiled from LLM outputs will preserve enough behavioral fidelity for the intended uses even without direct quantitative checks against live LLM behavior during rollout.

What would settle it

A side-by-side comparison in which live LLM agents and their compiled lookup-policy counterparts produce statistically different aggregate statistics on commuting distances or activity patterns when both are run on the same synthetic city.

Figures

Figures reproduced from arXiv: 2606.27650 by Gen Li, Jieyuan Lan, Masaki Ogura, Pengcheng Xu, Tao Feng, Zongyuan Wu.

**Figure 1.** Figure 1: Multi-scale spatial granularity of GenWorld’s empirically grounded urban world in Higashihiroshima, Hiroshima, Japan. (A) City-level view showing 196,608 individuals distributed across georeferenced buildings, validated against census data. (B) District-level view near Hiroshima University, revealing diverse building types (residential, commercial, educational) with topographic context and elevation data… view at source ↗

**Figure 2.** Figure 2: Query-conditioned prompt construction for our structured decision interface. Raw persona/state [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Plan-to-trajectory execution with a two [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Teacher preference scores (0–10) for ActivityPreference across persona categories (rows) and candidate activity types (columns), shown separately for maintenance (left) and leisure (right). The scores define the simulation-time sampling distribution used by the compiled policy. (a) Weekday intention-chain template preference. (b) Weekend intention-chain template preference [PITH_FULL_IMAGE:figures/full_fi… view at source ↗

**Figure 5.** Figure 5: Distilled teacher scores for DayPlan intention-chain templates, shown separately for weekday and weekend candidate sets. to be allocated where it matters most while keeping simulation-time inference amortized constanttime with respect to the number of agents and decision steps. This discretization trades off fidelity for tractability: behavior matching depends on context key design and coverage, and uns… view at source ↗

**Figure 6.** Figure 6: Commuting pattern extraction from YJMob100K after registering the anonymized mesh grid to our study area. The figure visualizes inferred home/work points and commuting distance statistics for the extracted subregion [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Commuting distance distributions under building-level grounding versus a tract-centroid baseline. The baseline collapses within-tract heterogeneity by placing all households at tract centroids, illustrating how coarse spatial grounding can distort short-range commuting structure even when workplace assignments are held fixed. 6 Platform Architecture GenWorld emphasizes modularity (independent component… view at source ↗

**Figure 8.** Figure 8: GenWorld System Architecture. The platform is organized into three layers: Population & Envi [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 10.** Figure 10: 24-hour activity occupancy distribution in the baseline rollout, shown as a radial stacked plot (outer radius indicates more people). The visualization highlights the expected day–night cycle: home/sleep dominates overnight, work and study increase during daytime hours, and leisure and other discretionary activities rise in the evening. activity locations [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗

**Figure 9.** Figure 9: Day–night contrast of visualized resident [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

**Figure 11.** Figure 11: All-day road-network traffic flow aggre [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗

read the original abstract

LLM-agent simulation faces a joint grounding and scaling problem: agents should act in environments that reflect real urban constraints, yet direct online LLM calls for city-scale populations are computationally prohibitive. We present GenWorld, an empirically grounded urban simulation infrastructure that combines a building-level synthetic city, a structured agent-environment interface, and offline compilation of LLM-derived decision signals into lookup policies for scalable rollout. In a reference instantiation for Higashihiroshima, Japan, GenWorld grounds 196,608 synthetic residents in census and geospatial data, validates demographic consistency against census tabulations, and uses YJMob100K mobile-phone data as a commuting-distance diagnostic. We demonstrate the infrastructure through three reproducible cases: a full-city weekday rollout, a weekday-weekend behavioral contrast, and a warning-response perturbation with auditable replanning traces. These cases support GenWorld as a reproducible platform for grounded and scalable LLM-agent studies, while calibrated forecasting for traffic, evacuation, or policy outcomes remains future work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GenWorld gives a workable infrastructure for city-scale LLM-agent sims by grounding a synthetic population in census data and compiling decisions offline, but the missing fidelity check between live LLM outputs and the lookup policies is a real gap.

read the letter

The one thing to know is that GenWorld offers a way to run LLM-agent simulations in a data-grounded city model at full scale by turning LLM outputs into static lookup policies for the agents.

It builds on census and geospatial data to create 196,608 synthetic residents in Higashihiroshima, checks that the demographics match official tabulations, and uses mobile phone records to validate commuting patterns. They then run three cases: a complete weekday simulation, a comparison of weekday and weekend behavior, and a scenario where agents respond to a warning with replanning that can be traced.

This setup is useful because it makes large experiments feasible while keeping the environment realistic. The structured interface and offline compilation are presented as the practical solution to the scaling issue.

The concern that stands out is the lack of evidence that the compiled policies behave like the original LLM would. The paper does not report any agreement rates or divergence measures on test states, so the grounding for the agent decisions rests on an untested assumption. That is the main soft spot, and it is not minor given the central role of the LLM in the system.

Readers who work on agent-based modeling or LLM applications in social science would get the most from it. The paper shows clear thinking about the practical constraints and provides enough to reproduce the cases.

I think it should go to peer review, with the expectation that the authors will need to add quantitative checks on the policy compilation.

Referee Report

2 major / 2 minor

Summary. The manuscript presents GenWorld, an infrastructure designed to address the grounding and scaling challenges in LLM-agent urban simulations. It integrates a building-level synthetic city model, a structured agent-environment interface, and an offline compilation method that converts LLM-derived decision signals into lookup policies. The system is instantiated for Higashihiroshima, Japan, grounding 196,608 synthetic residents using census and geospatial data, with demographic validation against census tabulations and commuting-distance diagnostics from YJMob100K mobile-phone data. Three reproducible demonstration cases are provided: a full-city weekday rollout, a weekday-weekend behavioral contrast, and a warning-response perturbation with auditable replanning traces. The authors position GenWorld as a platform for grounded and scalable LLM-agent studies, noting that calibrated forecasting is future work.

Significance. If the offline-compiled lookup policies retain sufficient behavioral fidelity to the original LLM decisions, GenWorld would represent a significant contribution by enabling city-scale simulations that are both empirically grounded in real data and computationally scalable. The grounding in census and mobile data, combined with the reproducible demonstration cases, would support its use as a platform for studying agent behaviors in urban settings. The explicit acknowledgment that calibrated forecasting remains future work appropriately scopes the current contribution.

major comments (2)

[§3 (Policy Compilation)] §3 (Policy Compilation): The central claim that GenWorld enables grounded, scalable LLM-agent studies via offline compilation of LLM decision signals into lookup policies requires that the compiled policies preserve behavioral fidelity. No quantitative metric (e.g., action-distribution divergence, trajectory statistics, or decision agreement rate) is reported comparing live LLM outputs to the lookup tables on held-out states. This is load-bearing for asserting that the scalability benefit retains the grounding property.
[§2 (Reference Instantiation)] §2 (Reference Instantiation): Demographic consistency is asserted against census tabulations and YJMob100K is invoked as a commuting-distance diagnostic for the 196,608 residents, but no error metrics, sample sizes, or exclusion rules are supplied. This weakens the ability to assess the strength of the empirical grounding claim.

minor comments (2)

[Abstract] Abstract: Including at least one quantitative result (e.g., a specific error rate from the demographic validation) would strengthen the 'empirically grounded' assertion without altering the scope.
[Demonstration cases] Demonstration cases: The reproducibility of the three cases is positive, but the manuscript could specify the state-space cardinality or number of unique states compiled into the lookup policies to better contextualize the scalability gain.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the requirements for substantiating our claims on scalable grounded simulation. We respond to each major comment below.

read point-by-point responses

Referee: [§3 (Policy Compilation)] §3 (Policy Compilation): The central claim that GenWorld enables grounded, scalable LLM-agent studies via offline compilation of LLM decision signals into lookup policies requires that the compiled policies preserve behavioral fidelity. No quantitative metric (e.g., action-distribution divergence, trajectory statistics, or decision agreement rate) is reported comparing live LLM outputs to the lookup tables on held-out states. This is load-bearing for asserting that the scalability benefit retains the grounding property.

Authors: We agree that a quantitative fidelity assessment is necessary to support the claim that offline compilation preserves the grounding property while enabling scale. The manuscript demonstrates the infrastructure via three reproducible cases that rely on the compiled policies but does not report a direct comparison (e.g., action-distribution divergence or decision agreement) against live LLM outputs on held-out states. In revision we will add this evaluation to §3, using a held-out set of states drawn from the same agent population. revision: yes
Referee: [§2 (Reference Instantiation)] §2 (Reference Instantiation): Demographic consistency is asserted against census tabulations and YJMob100K is invoked as a commuting-distance diagnostic for the 196,608 residents, but no error metrics, sample sizes, or exclusion rules are supplied. This weakens the ability to assess the strength of the empirical grounding claim.

Authors: We accept that the current presentation of the grounding validation lacks the quantitative detail needed for readers to evaluate its strength. The manuscript states that demographic consistency was checked against census tabulations and that YJMob100K served as a commuting-distance diagnostic, but supplies no error metrics, sample sizes, or exclusion criteria. In the revised manuscript we will expand §2 with these specifics, including the error metric(s) employed, the exact sample sizes for each comparison, and the data-exclusion rules applied to the mobile-phone traces. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation relies on independent external data sources

full rationale

The paper constructs a synthetic population from census and geospatial data, validates demographics directly against census tabulations, and uses YJMob100K as an external commuting-distance diagnostic. The offline compilation step converts LLM outputs to lookup policies for scalability but does not define any quantity in terms of itself or rename a fitted parameter as a prediction. No equations, self-citations, or uniqueness theorems are presented that would reduce the grounding claim to a tautology. The infrastructure is therefore self-contained against external benchmarks rather than circular by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only the abstract is available, so the ledger is limited to premises explicitly required by the claims in the abstract.

axioms (2)

domain assumption Census and geospatial data can be used to instantiate a representative synthetic population whose aggregate statistics match official tabulations
Invoked when the authors state that 196608 residents are grounded and validated against census tabulations
domain assumption Offline lookup policies compiled from LLM outputs retain enough behavioral fidelity to support the claimed simulation studies
Required for the claim that the compilation step enables scalable rollout while remaining grounded

pith-pipeline@v0.9.1-grok · 5712 in / 1280 out tokens · 50337 ms · 2026-06-29T02:51:40.264949+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 15 canonical work pages · 6 internal anchors

[1]

The gravity model.Annu

James E Anderson. The gravity model.Annu. Rev. Econ., 3(1):133–160, 2011

2011
[2]

Sallma: A software architec- ture for llm-based multi-agent systems

Marco Becattini, Roberto Verdecchia, and En- rico Vicario. Sallma: A software architec- ture for llm-based multi-agent systems. In 2025 IEEE/ACM International Workshop New Trends in Software Architecture (SATrends), pages 5–8. IEEE, 2025

2025
[3]

Global building morphology indicators.Computers, En- vironment and Urban Systems, 95:101809, 2022

Filip Biljecki and Yoong Shin Chow. Global building morphology indicators.Computers, En- vironment and Urban Systems, 95:101809, 2022. 19

2022
[4]

On the limits of agency in agent-based models.arXiv preprint arXiv:2409.10568, 2024

Ayush Chopra, Shashank Kumar, Nurullah Giray-Kuru, Ramesh Raskar, and Arnau Quera- Bofarull. On the limits of agency in agent-based models.arXiv preprint arXiv:2409.10568, 2024

work page arXiv 2024
[5]

Population synthesis using iterative pro- portional fitting (ipf): A review and future research.Transportation Research Procedia, 17:223–233, 2016

Abdoul-Ahad Choupani and Amir Reza Mam- doohi. Population synthesis using iterative pro- portional fitting (ipf): A review and future research.Transportation Research Procedia, 17:223–233, 2016

2016
[6]

Brookings Institution Press, 1996

Joshua M Epstein and Robert Axtell.Growing artificial societies: social science from the bot- tom up. Brookings Institution Press, 1996

1996
[7]

Reproducible methods for modeling combined public transport and cycling trips and associ- ated benefits: Evidence from the biclar tool

Rosa F´ elix, Filipe Moura, and Robin Lovelace. Reproducible methods for modeling combined public transport and cycling trips and associ- ated benefits: Evidence from the biclar tool. Computers, Environment and Urban Systems, 117:102230, 2025

2025
[8]

Citybench: Evaluating the capabilities of large language models for urban tasks.arXiv preprint arXiv:2406.13945, 2024

Jie Feng, Jun Zhang, Tianhui Liu, Xin Zhang, Tianjian Ouyang, Junbo Yan, Yuwei Du, Siqi Guo, and Yong Li. Citybench: Evaluating the capabilities of large language models for urban tasks.arXiv preprint arXiv:2406.13945, 2024. Accepted by KDD 2025 D&B Track

work page arXiv 2024
[9]

Kunihiko Fujiwara, Ryuta Tsurumi, Tomoki Kiyono, Zicheng Fan, Xiucheng Liang, Binyu Lei, Winston Yap, Koichi Ito, and Filip Biljecki. Voxcity: A seamless framework for open geospa- tial data integration, grid-based semantic 3d city model generation, and urban environment simu- lation.Computers, Environment and Urban Sys- tems, 123:102366, 2026

2026
[10]

Agentscope: A flexible yet robust multi-agent platform.arXiv preprint arXiv:2402.14034, 2024

Dawei Gao, Zitao Li, Xuchen Pan, Weirui Kuang, Zhijian Ma, Bingchen Qian, Fei Wei, Wenhao Zhang, Yuexiang Xie, Daoyuan Chen, et al. Agentscope: A flexible yet ro- bust multi-agent platform.arXiv preprint arXiv:2402.14034, 2024

work page arXiv 2024
[11]

Understanding individual hu- man mobility patterns.nature, 453(7196):779– 782, 2008

Marta C Gonzalez, Cesar A Hidalgo, and Albert- Laszlo Barabasi. Understanding individual hu- man mobility patterns.nature, 453(7196):779– 782, 2008

2008
[12]

What about people in re- gional science.Transport Sociology: Social as- pects of transport planning, pages 143–158, 1970

Torsten H¨ agerstrand. What about people in re- gional science.Transport Sociology: Social as- pects of transport planning, pages 143–158, 1970

1970
[13]

Spatiotempo- ral patterns of urban human mobility.Journal of Statistical Physics, 151(1):304–318, 2013

Samiul Hasan, Christian M Schneider, Satish V Ukkusuri, and Marta C Gonz´ alez. Spatiotempo- ral patterns of urban human mobility.Journal of Statistical Physics, 151(1):304–318, 2013

2013
[14]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[15]

Metagpt: Meta programming for a multi-agent collaborative framework

Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al. Metagpt: Meta programming for a multi-agent collaborative framework. In The twelfth international conference on learning representations, 2023

2023
[16]

Introducing matsim

Andreas Horni, Kai Nagel, and Kay W Ax- hausen. Introducing matsim. InMulti-Agent Transport Simulation MATSim. Ubiquity Press, 2016

2016
[17]

Large language models as simu- lated economic agents: What can we learn from homo silicus? Technical report, National Bureau of Economic Research, 2023

John J Horton. Large language models as simu- lated economic agents: What can we learn from homo silicus? Technical report, National Bureau of Economic Research, 2023

2023
[18]

A method to create a synthetic population with social networks for geographically-explicit agent- based models.Computational Urban Science, 2(1):7, 2022

Na Jiang, Andrew T Crooks, Hamdi Kavak, Annetta Burger, and William G Kennedy. A method to create a synthetic population with social networks for geographically-explicit agent- based models.Computational Urban Science, 2(1):7, 2022

2022
[19]

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. Swe-bench: Can language models resolve real-world github issues?arXiv preprint arXiv:2310.06770, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[20]

Trajllm: A modular llm- enhanced agent-based framework for realistic hu- man trajectory simulation

Chenlu Ju, Jiaxin Liu, Shobhit Sinha, Hao Xue, and Flora Salim. Trajllm: A modular llm- enhanced agent-based framework for realistic hu- man trajectory simulation. InCompanion Pro- ceedings of the ACM on Web Conference 2025, pages 2847–2850, 2025

2025
[21]

Nationwide synthetic human mobility dataset construction from limited travel sur- veys and open data.Computer-Aided Civil and Infrastructure Engineering, 39(21):3337– 3353, 2024

Takehiro Kashiyama, Yanbo Pang, Yuya Shibuya, Takahiro Yabe, and Yoshihide Seki- moto. Nationwide synthetic human mobility dataset construction from limited travel sur- veys and open data.Computer-Aided Civil and Infrastructure Engineering, 39(21):3337– 3353, 2024

2024
[22]

Recent develop- ment and applications of sumo-simulation of ur- ban mobility.International journal on advances in systems and measurements, 5(3&4):128–138, 2012

Daniel Krajzewicz, Jakob Erdmann, Michael Behrisch, Laura Bieker, et al. Recent develop- ment and applications of sumo-simulation of ur- ban mobility.International journal on advances in systems and measurements, 5(3&4):128–138, 2012. 20

2012
[23]

Compu- tational social science.Science, 323(5915):721– 723, 2009

David Lazer, Alex Pentland, Lada Adamic, Sinan Aral, Albert-L´ aszl´ o Barab´ asi, Devon Brewer, Nicholas Christakis, Noshir Contractor, James Fowler, Myron Gutmann, et al. Compu- tational social science.Science, 323(5915):721– 723, 2009

2009
[24]

arXiv preprint arXiv:2407.18932 , year=

Xuchuan Li, Fei Huang, Jianrong Lv, Zhix- iong Xiao, Guolong Li, and Yang Yue. Be more real: Travel diary generation using llm agents and individual profiles.arXiv preprint arXiv:2407.18932, 2024

work page arXiv 2024
[25]

A large lan- guage model for feasible and diverse popula- tion synthesis.arXiv preprint arXiv:2505.04196, 2025

Sung Yoo Lim, Hyunsoo Yun, Prateek Bansal, Dong-Kyu Kim, and Eui-Jin Kim. A large lan- guage model for feasible and diverse popula- tion synthesis.arXiv preprint arXiv:2505.04196, 2025

work page arXiv 2025
[26]

arXiv preprint arXiv:2506.23306 , year=

Qi Liu, Can Li, and Wanjing Ma. Gatsim: Ur- ban mobility simulation with generative agents. arXiv preprint arXiv:2506.23306, 2025

work page arXiv 2025
[27]

Toward llm-agent-based modeling of transporta- tion systems: A conceptual framework.Artificial Intelligence for Transportation, 1:100001, 2025

Tianming Liu, Jirong Yang, and Yafeng Yin. Toward llm-agent-based modeling of transporta- tion systems: A conceptual framework.Artificial Intelligence for Transportation, 1:100001, 2025

2025
[28]

AgentBench: Evaluating LLMs as Agents

Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, et al. Agent- bench: Evaluating llms as agents.arXiv preprint arXiv:2308.03688, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[29]

Mason: A multiagent simulation environment.Simulation, 81(7):517–527, 2005

Sean Luke, Claudio Cioffi-Revilla, Liviu Panait, Keith Sullivan, and Gabriel Balan. Mason: A multiagent simulation environment.Simulation, 81(7):517–527, 2005

2005
[30]

Learning universal human mobility patterns with a foundation model for cross-domain data fusion.Transportation Research Part C: Emerg- ing Technologies, 180:105311, 2025

Haoxuan Ma, Xishun Liao, Yifan Liu, Qinhua Jiang, Chris Stanford, Shangqing Cao, and Jiaqi Ma. Learning universal human mobility patterns with a foundation model for cross-domain data fusion.Transportation Research Part C: Emerg- ing Technologies, 180:105311, 2025

2025
[31]

Data- driven generation of spatio-temporal routines in human mobility.Data Mining and Knowledge Discovery, 32(3):787–829, 2018

Luca Pappalardo and Filippo Simini. Data- driven generation of spatio-temporal routines in human mobility.Data Mining and Knowledge Discovery, 32(3):787–829, 2018

2018
[32]

Generative agents: Interac- tive simulacra of human behavior

Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interac- tive simulacra of human behavior. InProceedings of the 36th annual acm symposium on user inter- face software and technology, pages 1–22, 2023

2023
[33]

AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents Advances Understanding of Human Behaviors and Society

Jinghua Piao, Yuwei Yan, Jun Zhang, Nian Li, Junbo Yan, Xiaochong Lan, Zhihong Lu, Zhi- heng Zheng, Jing Yi Wang, Di Zhou, et al. Agentsociety: Large-scale simulation of llm- driven generative agents advances understanding of human behaviors and society.arXiv preprint arXiv:2502.08691, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[34]

Toolformer: Language mod- els can teach themselves to use tools.Ad- vances in Neural Information Processing Sys- tems, 36:68539–68551, 2023

Timo Schick, Jane Dwivedi-Yu, Roberto Dess` ı, Roberta Raileanu, Maria Lomeli, Eric Ham- bro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language mod- els can teach themselves to use tools.Ad- vances in Neural Information Processing Sys- tems, 36:68539–68551, 2023

2023
[35]

Building, composing and exper- imenting complex spatial models with the gama platform.GeoInformatica, 23(2):299–322, 2019

Patrick Taillandier, Benoit Gaudou, Arnaud Grignard, Quang-Nghi Huynh, Nicolas Maril- leau, Philippe Caillou, Damien Philippon, and Alexis Drogoul. Building, composing and exper- imenting complex spatial models with the gama platform.GeoInformatica, 23(2):299–322, 2019

2019
[36]

Gemma 3 Technical Report

Gemma Team, Aishwarya Kamath, Johan Fer- ret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ram´ e, Morgane Rivi` ere, et al. Gemma 3 technical report.arXiv preprint arXiv:2503.19786, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[37]

Netlogo: A sim- ple environment for modeling complexity

Seth Tisue, Uri Wilensky, et al. Netlogo: A sim- ple environment for modeling complexity. InIn- ternational conference on complex systems, vol- ume 21, pages 16–21. Boston, MA, 2004

2004
[38]

Large language models as urban residents: An llm agent framework for personal mobility gen- eration.Advances in Neural Information Pro- cessing Systems, 37:124547–124574, 2024

Jiawei Wang, Renhe Jiang, Chuang Yang, Zengqing Wu, Makoto Onizuka, Ryosuke Shibasaki, Noboru Koshizuka, and Chuan Xiao. Large language models as urban residents: An llm agent framework for personal mobility gen- eration.Advances in Neural Information Pro- cessing Systems, 37:124547–124574, 2024

2024
[39]

The rise and potential of large language model based agents: A survey.Science China Information Sciences, 68(2):121101, 2025

Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, et al. The rise and potential of large language model based agents: A survey.Science China Information Sciences, 68(2):121101, 2025

2025
[40]

Yjmob100k: City- scale and longitudinal dataset of anonymized human mobility trajectories.Scientific Data, 11(1):397, 2024

Takahiro Yabe, Kota Tsubouchi, Toru Shimizu, Yoshihide Sekimoto, Kaoru Sezaki, Esteban Moro, and Alex Pentland. Yjmob100k: City- scale and longitudinal dataset of anonymized human mobility trajectories.Scientific Data, 11(1):397, 2024. 21

2024
[41]

OpenCity: a scalable platform to simulate urban activities with massive LLM agents.arXiv preprint arXiv:2410.21286, 2024

Yuwei Yan, Qingbin Zeng, Zhiheng Zheng, Jingzhe Yuan, Jie Feng, Jun Zhang, Fengli Xu, and Yong Li. Opencity: A scalable platform to simulate urban activities with massive llm agents.arXiv preprint arXiv:2410.21286, 2024

work page arXiv 2024
[42]

React: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InThe eleventh inter- national conference on learning representations, 2022

2022
[43]

Mo- bilecity: An efficient framework for large-scale urban behavior simulation.arXiv preprint arXiv:2504.16946, 2025

Xiaotong Ye, Nicolas Bougie, Toshihiko Ya- masaki, and Narimasa Watanabe. Mo- bilecity: An efficient framework for large-scale urban behavior simulation.arXiv preprint arXiv:2504.16946, 2025

work page arXiv 2025
[44]

Llm-aidsim: Llm-enhanced agent-based influence diffusion simulation in so- cial networks.Systems, 13(1):29, 2025

Lan Zhang, Yuxuan Hu, Weihua Li, Quan Bai, and Parma Nand. Llm-aidsim: Llm-enhanced agent-based influence diffusion simulation in so- cial networks.Systems, 13(1):29, 2025

2025
[45]

arXiv preprint arXiv:2504.10157 , year=

Xinnong Zhang, Jiayu Lin, Xinyi Mou, Shiyue Yang, Xiawei Liu, Libo Sun, Hanjia Lyu, Yihang Yang, Weihong Qi, Yue Chen, et al. Socioverse: A world model for social simulation powered by llm agents and a pool of 10 million real-world users.arXiv preprint arXiv:2504.10157, 2025

work page arXiv 2025
[46]

WebArena: A Realistic Web Environment for Building Autonomous Agents

Shuyan Zhou, Frank F Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, et al. Webarena: A realistic web environment for building autonomous agents.arXiv preprint arXiv:2307.13854, 2023. A Supplementary Materials A.1 Additional Figures A.2 Data Sources Figure A1: Census data summary showing age- gend...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[1] [1]

The gravity model.Annu

James E Anderson. The gravity model.Annu. Rev. Econ., 3(1):133–160, 2011

2011

[2] [2]

Sallma: A software architec- ture for llm-based multi-agent systems

Marco Becattini, Roberto Verdecchia, and En- rico Vicario. Sallma: A software architec- ture for llm-based multi-agent systems. In 2025 IEEE/ACM International Workshop New Trends in Software Architecture (SATrends), pages 5–8. IEEE, 2025

2025

[3] [3]

Global building morphology indicators.Computers, En- vironment and Urban Systems, 95:101809, 2022

Filip Biljecki and Yoong Shin Chow. Global building morphology indicators.Computers, En- vironment and Urban Systems, 95:101809, 2022. 19

2022

[4] [4]

On the limits of agency in agent-based models.arXiv preprint arXiv:2409.10568, 2024

Ayush Chopra, Shashank Kumar, Nurullah Giray-Kuru, Ramesh Raskar, and Arnau Quera- Bofarull. On the limits of agency in agent-based models.arXiv preprint arXiv:2409.10568, 2024

work page arXiv 2024

[5] [5]

Population synthesis using iterative pro- portional fitting (ipf): A review and future research.Transportation Research Procedia, 17:223–233, 2016

Abdoul-Ahad Choupani and Amir Reza Mam- doohi. Population synthesis using iterative pro- portional fitting (ipf): A review and future research.Transportation Research Procedia, 17:223–233, 2016

2016

[6] [6]

Brookings Institution Press, 1996

Joshua M Epstein and Robert Axtell.Growing artificial societies: social science from the bot- tom up. Brookings Institution Press, 1996

1996

[7] [7]

Reproducible methods for modeling combined public transport and cycling trips and associ- ated benefits: Evidence from the biclar tool

Rosa F´ elix, Filipe Moura, and Robin Lovelace. Reproducible methods for modeling combined public transport and cycling trips and associ- ated benefits: Evidence from the biclar tool. Computers, Environment and Urban Systems, 117:102230, 2025

2025

[8] [8]

Citybench: Evaluating the capabilities of large language models for urban tasks.arXiv preprint arXiv:2406.13945, 2024

Jie Feng, Jun Zhang, Tianhui Liu, Xin Zhang, Tianjian Ouyang, Junbo Yan, Yuwei Du, Siqi Guo, and Yong Li. Citybench: Evaluating the capabilities of large language models for urban tasks.arXiv preprint arXiv:2406.13945, 2024. Accepted by KDD 2025 D&B Track

work page arXiv 2024

[9] [9]

Kunihiko Fujiwara, Ryuta Tsurumi, Tomoki Kiyono, Zicheng Fan, Xiucheng Liang, Binyu Lei, Winston Yap, Koichi Ito, and Filip Biljecki. Voxcity: A seamless framework for open geospa- tial data integration, grid-based semantic 3d city model generation, and urban environment simu- lation.Computers, Environment and Urban Sys- tems, 123:102366, 2026

2026

[10] [10]

Agentscope: A flexible yet robust multi-agent platform.arXiv preprint arXiv:2402.14034, 2024

Dawei Gao, Zitao Li, Xuchen Pan, Weirui Kuang, Zhijian Ma, Bingchen Qian, Fei Wei, Wenhao Zhang, Yuexiang Xie, Daoyuan Chen, et al. Agentscope: A flexible yet ro- bust multi-agent platform.arXiv preprint arXiv:2402.14034, 2024

work page arXiv 2024

[11] [11]

Understanding individual hu- man mobility patterns.nature, 453(7196):779– 782, 2008

Marta C Gonzalez, Cesar A Hidalgo, and Albert- Laszlo Barabasi. Understanding individual hu- man mobility patterns.nature, 453(7196):779– 782, 2008

2008

[12] [12]

What about people in re- gional science.Transport Sociology: Social as- pects of transport planning, pages 143–158, 1970

Torsten H¨ agerstrand. What about people in re- gional science.Transport Sociology: Social as- pects of transport planning, pages 143–158, 1970

1970

[13] [13]

Spatiotempo- ral patterns of urban human mobility.Journal of Statistical Physics, 151(1):304–318, 2013

Samiul Hasan, Christian M Schneider, Satish V Ukkusuri, and Marta C Gonz´ alez. Spatiotempo- ral patterns of urban human mobility.Journal of Statistical Physics, 151(1):304–318, 2013

2013

[14] [14]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[15] [15]

Metagpt: Meta programming for a multi-agent collaborative framework

Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al. Metagpt: Meta programming for a multi-agent collaborative framework. In The twelfth international conference on learning representations, 2023

2023

[16] [16]

Introducing matsim

Andreas Horni, Kai Nagel, and Kay W Ax- hausen. Introducing matsim. InMulti-Agent Transport Simulation MATSim. Ubiquity Press, 2016

2016

[17] [17]

Large language models as simu- lated economic agents: What can we learn from homo silicus? Technical report, National Bureau of Economic Research, 2023

John J Horton. Large language models as simu- lated economic agents: What can we learn from homo silicus? Technical report, National Bureau of Economic Research, 2023

2023

[18] [18]

A method to create a synthetic population with social networks for geographically-explicit agent- based models.Computational Urban Science, 2(1):7, 2022

Na Jiang, Andrew T Crooks, Hamdi Kavak, Annetta Burger, and William G Kennedy. A method to create a synthetic population with social networks for geographically-explicit agent- based models.Computational Urban Science, 2(1):7, 2022

2022

[19] [19]

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. Swe-bench: Can language models resolve real-world github issues?arXiv preprint arXiv:2310.06770, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[20] [20]

Trajllm: A modular llm- enhanced agent-based framework for realistic hu- man trajectory simulation

Chenlu Ju, Jiaxin Liu, Shobhit Sinha, Hao Xue, and Flora Salim. Trajllm: A modular llm- enhanced agent-based framework for realistic hu- man trajectory simulation. InCompanion Pro- ceedings of the ACM on Web Conference 2025, pages 2847–2850, 2025

2025

[21] [21]

Nationwide synthetic human mobility dataset construction from limited travel sur- veys and open data.Computer-Aided Civil and Infrastructure Engineering, 39(21):3337– 3353, 2024

Takehiro Kashiyama, Yanbo Pang, Yuya Shibuya, Takahiro Yabe, and Yoshihide Seki- moto. Nationwide synthetic human mobility dataset construction from limited travel sur- veys and open data.Computer-Aided Civil and Infrastructure Engineering, 39(21):3337– 3353, 2024

2024

[22] [22]

Recent develop- ment and applications of sumo-simulation of ur- ban mobility.International journal on advances in systems and measurements, 5(3&4):128–138, 2012

Daniel Krajzewicz, Jakob Erdmann, Michael Behrisch, Laura Bieker, et al. Recent develop- ment and applications of sumo-simulation of ur- ban mobility.International journal on advances in systems and measurements, 5(3&4):128–138, 2012. 20

2012

[23] [23]

Compu- tational social science.Science, 323(5915):721– 723, 2009

David Lazer, Alex Pentland, Lada Adamic, Sinan Aral, Albert-L´ aszl´ o Barab´ asi, Devon Brewer, Nicholas Christakis, Noshir Contractor, James Fowler, Myron Gutmann, et al. Compu- tational social science.Science, 323(5915):721– 723, 2009

2009

[24] [24]

arXiv preprint arXiv:2407.18932 , year=

Xuchuan Li, Fei Huang, Jianrong Lv, Zhix- iong Xiao, Guolong Li, and Yang Yue. Be more real: Travel diary generation using llm agents and individual profiles.arXiv preprint arXiv:2407.18932, 2024

work page arXiv 2024

[25] [25]

A large lan- guage model for feasible and diverse popula- tion synthesis.arXiv preprint arXiv:2505.04196, 2025

Sung Yoo Lim, Hyunsoo Yun, Prateek Bansal, Dong-Kyu Kim, and Eui-Jin Kim. A large lan- guage model for feasible and diverse popula- tion synthesis.arXiv preprint arXiv:2505.04196, 2025

work page arXiv 2025

[26] [26]

arXiv preprint arXiv:2506.23306 , year=

Qi Liu, Can Li, and Wanjing Ma. Gatsim: Ur- ban mobility simulation with generative agents. arXiv preprint arXiv:2506.23306, 2025

work page arXiv 2025

[27] [27]

Toward llm-agent-based modeling of transporta- tion systems: A conceptual framework.Artificial Intelligence for Transportation, 1:100001, 2025

Tianming Liu, Jirong Yang, and Yafeng Yin. Toward llm-agent-based modeling of transporta- tion systems: A conceptual framework.Artificial Intelligence for Transportation, 1:100001, 2025

2025

[28] [28]

AgentBench: Evaluating LLMs as Agents

Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, et al. Agent- bench: Evaluating llms as agents.arXiv preprint arXiv:2308.03688, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[29] [29]

Mason: A multiagent simulation environment.Simulation, 81(7):517–527, 2005

Sean Luke, Claudio Cioffi-Revilla, Liviu Panait, Keith Sullivan, and Gabriel Balan. Mason: A multiagent simulation environment.Simulation, 81(7):517–527, 2005

2005

[30] [30]

Learning universal human mobility patterns with a foundation model for cross-domain data fusion.Transportation Research Part C: Emerg- ing Technologies, 180:105311, 2025

Haoxuan Ma, Xishun Liao, Yifan Liu, Qinhua Jiang, Chris Stanford, Shangqing Cao, and Jiaqi Ma. Learning universal human mobility patterns with a foundation model for cross-domain data fusion.Transportation Research Part C: Emerg- ing Technologies, 180:105311, 2025

2025

[31] [31]

Data- driven generation of spatio-temporal routines in human mobility.Data Mining and Knowledge Discovery, 32(3):787–829, 2018

Luca Pappalardo and Filippo Simini. Data- driven generation of spatio-temporal routines in human mobility.Data Mining and Knowledge Discovery, 32(3):787–829, 2018

2018

[32] [32]

Generative agents: Interac- tive simulacra of human behavior

Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interac- tive simulacra of human behavior. InProceedings of the 36th annual acm symposium on user inter- face software and technology, pages 1–22, 2023

2023

[33] [33]

AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents Advances Understanding of Human Behaviors and Society

Jinghua Piao, Yuwei Yan, Jun Zhang, Nian Li, Junbo Yan, Xiaochong Lan, Zhihong Lu, Zhi- heng Zheng, Jing Yi Wang, Di Zhou, et al. Agentsociety: Large-scale simulation of llm- driven generative agents advances understanding of human behaviors and society.arXiv preprint arXiv:2502.08691, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[34] [34]

Toolformer: Language mod- els can teach themselves to use tools.Ad- vances in Neural Information Processing Sys- tems, 36:68539–68551, 2023

Timo Schick, Jane Dwivedi-Yu, Roberto Dess` ı, Roberta Raileanu, Maria Lomeli, Eric Ham- bro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language mod- els can teach themselves to use tools.Ad- vances in Neural Information Processing Sys- tems, 36:68539–68551, 2023

2023

[35] [35]

Building, composing and exper- imenting complex spatial models with the gama platform.GeoInformatica, 23(2):299–322, 2019

Patrick Taillandier, Benoit Gaudou, Arnaud Grignard, Quang-Nghi Huynh, Nicolas Maril- leau, Philippe Caillou, Damien Philippon, and Alexis Drogoul. Building, composing and exper- imenting complex spatial models with the gama platform.GeoInformatica, 23(2):299–322, 2019

2019

[36] [36]

Gemma 3 Technical Report

Gemma Team, Aishwarya Kamath, Johan Fer- ret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ram´ e, Morgane Rivi` ere, et al. Gemma 3 technical report.arXiv preprint arXiv:2503.19786, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[37] [37]

Netlogo: A sim- ple environment for modeling complexity

Seth Tisue, Uri Wilensky, et al. Netlogo: A sim- ple environment for modeling complexity. InIn- ternational conference on complex systems, vol- ume 21, pages 16–21. Boston, MA, 2004

2004

[38] [38]

Large language models as urban residents: An llm agent framework for personal mobility gen- eration.Advances in Neural Information Pro- cessing Systems, 37:124547–124574, 2024

Jiawei Wang, Renhe Jiang, Chuang Yang, Zengqing Wu, Makoto Onizuka, Ryosuke Shibasaki, Noboru Koshizuka, and Chuan Xiao. Large language models as urban residents: An llm agent framework for personal mobility gen- eration.Advances in Neural Information Pro- cessing Systems, 37:124547–124574, 2024

2024

[39] [39]

The rise and potential of large language model based agents: A survey.Science China Information Sciences, 68(2):121101, 2025

Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, et al. The rise and potential of large language model based agents: A survey.Science China Information Sciences, 68(2):121101, 2025

2025

[40] [40]

Yjmob100k: City- scale and longitudinal dataset of anonymized human mobility trajectories.Scientific Data, 11(1):397, 2024

Takahiro Yabe, Kota Tsubouchi, Toru Shimizu, Yoshihide Sekimoto, Kaoru Sezaki, Esteban Moro, and Alex Pentland. Yjmob100k: City- scale and longitudinal dataset of anonymized human mobility trajectories.Scientific Data, 11(1):397, 2024. 21

2024

[41] [41]

OpenCity: a scalable platform to simulate urban activities with massive LLM agents.arXiv preprint arXiv:2410.21286, 2024

Yuwei Yan, Qingbin Zeng, Zhiheng Zheng, Jingzhe Yuan, Jie Feng, Jun Zhang, Fengli Xu, and Yong Li. Opencity: A scalable platform to simulate urban activities with massive llm agents.arXiv preprint arXiv:2410.21286, 2024

work page arXiv 2024

[42] [42]

React: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InThe eleventh inter- national conference on learning representations, 2022

2022

[43] [43]

Mo- bilecity: An efficient framework for large-scale urban behavior simulation.arXiv preprint arXiv:2504.16946, 2025

Xiaotong Ye, Nicolas Bougie, Toshihiko Ya- masaki, and Narimasa Watanabe. Mo- bilecity: An efficient framework for large-scale urban behavior simulation.arXiv preprint arXiv:2504.16946, 2025

work page arXiv 2025

[44] [44]

Llm-aidsim: Llm-enhanced agent-based influence diffusion simulation in so- cial networks.Systems, 13(1):29, 2025

Lan Zhang, Yuxuan Hu, Weihua Li, Quan Bai, and Parma Nand. Llm-aidsim: Llm-enhanced agent-based influence diffusion simulation in so- cial networks.Systems, 13(1):29, 2025

2025

[45] [45]

arXiv preprint arXiv:2504.10157 , year=

Xinnong Zhang, Jiayu Lin, Xinyi Mou, Shiyue Yang, Xiawei Liu, Libo Sun, Hanjia Lyu, Yihang Yang, Weihong Qi, Yue Chen, et al. Socioverse: A world model for social simulation powered by llm agents and a pool of 10 million real-world users.arXiv preprint arXiv:2504.10157, 2025

work page arXiv 2025

[46] [46]

WebArena: A Realistic Web Environment for Building Autonomous Agents

Shuyan Zhou, Frank F Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, et al. Webarena: A realistic web environment for building autonomous agents.arXiv preprint arXiv:2307.13854, 2023. A Supplementary Materials A.1 Additional Figures A.2 Data Sources Figure A1: Census data summary showing age- gend...

work page internal anchor Pith review Pith/arXiv arXiv 2023