arxiv: 2604.17456 · v1 · submitted 2026-04-19 · 💻 cs.AI

Recognition: unknown

TrafficClaw: Generalizable Urban Traffic Control via Unified Physical Environment Modeling

Siqi Lai , Pan Zhang , Yuping Zhou , Jindong Han , Yansong Ning , Hao Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:43 UTC · model grok-4.3

classification 💻 cs.AI

keywords urban traffic controlLLM agentunified environmentsystem-level optimizationtraffic simulationreinforcement learningspatiotemporal reasoning

0 comments

The pith

A single shared environment lets one LLM agent coordinate traffic signals, freeways, transit, and taxis across a city.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Urban traffic control spans many linked parts that influence each other, yet most methods treat signals, freeways, buses, and taxis as separate problems. The paper argues that only a unified physical environment can make cross-subsystem effects visible and allow consistent control. TrafficClaw builds this environment so that changes in one part propagate through shared roads and demand, then places an LLM agent inside it to diagnose issues and adjust plans over time. The agent uses step-by-step spatiotemporal reasoning plus memory of past procedures, and is trained first by imitation and then by system-level reinforcement learning. Tests show the resulting policies work on traffic patterns, dynamics, and task goals the agent never encountered during training.

Core claim

TrafficClaw constructs a unified runtime environment that folds traffic signals, freeways, public transit, and taxi services into one dynamical system sharing infrastructure and mobility demand. Inside this environment an LLM agent performs executable spatiotemporal reasoning and maintains reusable procedural memory to diagnose problems across subsystems and refine control strategies. A multi-stage pipeline first initializes the agent with supervision and then applies agentic reinforcement learning with system-level rewards. Experiments demonstrate that the resulting controller produces robust, transferable, and system-aware performance on previously unseen scenarios, dynamics, and task sets

What carries the argument

The unified runtime environment, which couples heterogeneous subsystems through shared physical infrastructure, mobility demand, and spatiotemporal constraints and supplies closed-loop feedback to the LLM agent.

If this is right

Local interventions in one subsystem produce predictable effects on connected subsystems through the shared model.
Control policies transfer to new traffic volumes, speeds, and task definitions without task-specific retraining.
System-level reward signals yield coordinated behavior that isolated optimizations cannot achieve.
Continual refinement using procedural memory improves performance over repeated interactions with the environment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same unified-environment approach could be tested on other coupled physical networks such as power distribution or freight logistics.
Deployment would require checking whether the simulated couplings match measurements from real multi-agency city sensors.
The framework leaves open whether non-LLM planners could replace the agent inside the same environment.

Load-bearing premise

A single simulation can faithfully represent how actions in one traffic subsystem affect the others through shared roads and demand patterns.

What would settle it

A controlled test in which the agent is given a scenario with tight coupling between signal timing and freeway ramp metering; the agent's proposed actions are then compared against separate subsystem controllers on total network delay and throughput.

Figures

Figures reproduced from arXiv: 2604.17456 by Hao Liu, Jindong Han, Pan Zhang, Siqi Lai, Yansong Ning, Yuping Zhou.

**Figure 2.** Figure 2: The framework overview of TrafficClaw. closed-loop interactions with the environment under coupled traffic dynamics. (2) In parallel, we introduce a spatiotemporal memory mechanism that maintains cross-episode analytical context, enabling the accumulation of reusable procedural knowledge across diverse traffic regimes and improving coherence and effectivenessin long-horizon reasoning and control. (3) To e… view at source ↗

**Figure 4.** Figure 4: Ablation on training and memory management. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 6.** Figure 6: Case study of self-improvement in TrafficClaw. [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: RL convergence of the agent during training. [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

read the original abstract

Urban traffic control is a system-level coordination problem spanning heterogeneous subsystems, including traffic signals, freeways, public transit, and taxi services. Existing optimization-based, reinforcement learning (RL), and emerging LLM-based approaches are largely designed for isolated tasks, limiting both cross-task generalization and the ability to capture coupled physical dynamics across subsystems. We argue that effective system-level control requires a unified physical environment in which subsystems share infrastructure, mobility demand, and spatiotemporal constraints, allowing local interventions to propagate through the network. To this end, we propose TrafficClaw, a framework for general urban traffic control built upon a unified runtime environment. TrafficClaw integrates heterogeneous subsystems into a shared dynamical system, enabling explicit modeling of cross-subsystem interactions and closed-loop agent-environment feedback. Within this environment, we develop an LLM agent with executable spatiotemporal reasoning and reusable procedural memory, supporting unified diagnostics across subsystems and continual strategy refinement. Furthermore, we introduce a multi-stage training pipeline with supervised initialization and agentic RL with system-level optimization, further enabling coordinated and system-aware performance. Experiments demonstrate that TrafficClaw achieves robust, transferable, and system-aware performance across unseen traffic scenarios, dynamics, and task configurations. Our project is available at https://github.com/usail-hkust/TrafficClaw.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TrafficClaw frames multi-subsystem traffic control as one coupled environment with an LLM agent on top, but the generalization claims rest on experiments whose details are not visible yet.

read the letter

TrafficClaw tries to treat urban traffic as a single coordinated system rather than separate optimizations for signals, freeways, transit, and taxis. The central idea is a shared runtime environment that lets interventions in one part propagate through shared demand and physical constraints, with an LLM agent that uses procedural memory to diagnose issues and refine strategies over time. A multi-stage pipeline moves from supervised initialization to agentic RL aimed at system-level rewards. This setup directly targets the limitation that isolated methods miss cross-subsystem effects, which is a real issue in actual cities where congestion spreads quickly. The architecture description is clear on how closed-loop feedback works and why procedural memory could support continual adaptation without retraining from scratch each time. The motivation section does a solid job showing why current RL or optimization approaches fall short when subsystems interact. The soft spots sit in the evaluation. The abstract states that the system shows robust, transferable performance on unseen scenarios and dynamics, yet no concrete metrics, baselines, ablation results, or implementation specifics appear to back that up. Without those, it is hard to separate how much comes from the unified environment versus standard techniques or to check whether the LLM's spatiotemporal reasoning stays reliable under real load. The assumption that one environment can faithfully capture all heterogeneous constraints without losing accuracy also needs evidence, as oversimplification there would undermine the whole claim. This paper is for researchers working on AI-driven transportation systems or hybrid LLM-RL control. Someone exploring multi-domain coordination or memory-augmented agents could extract useful pieces from the pipeline and environment design. It deserves a serious referee because the framing is coherent and the underlying problem is practical, even though the results section will require careful checking for reproducibility and effect sizes. I would send it to peer review so the authors can get targeted feedback on the experiments and whether the LLM component scales as described.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces TrafficClaw, a framework for generalizable urban traffic control. It proposes a unified runtime environment integrating heterogeneous subsystems (traffic signals, freeways, public transit, taxi services) into a shared dynamical system to explicitly model cross-subsystem interactions and closed-loop feedback. An LLM agent with executable spatiotemporal reasoning and reusable procedural memory is developed to support unified diagnostics and continual strategy refinement. A multi-stage training pipeline (supervised initialization followed by agentic RL with system-level optimization) is presented. Experiments are claimed to demonstrate robust, transferable, and system-aware performance across unseen traffic scenarios, dynamics, and task configurations.

Significance. If the central claims hold, the work could advance the field by shifting from isolated-task optimization/RL/LLM methods to system-level coordination that accounts for coupled physical dynamics. The open-source release at the provided GitHub link supports reproducibility and extension. The combination of unified physical modeling with LLM-based reasoning is a promising direction for complex, multi-subsystem control problems.

major comments (3)

[Unified Runtime Environment] The central generalization claim depends on the unified runtime environment faithfully capturing coupled dynamics and constraint propagation across subsystems. The high-level description in the methods leaves unclear how shared mobility demand and spatiotemporal constraints are implemented without introducing inconsistencies (e.g., mismatched time scales or infeasible state transitions between traffic signals and transit).
[LLM Agent Design] The LLM agent's executable spatiotemporal reasoning and reusable procedural memory are load-bearing for the system-aware performance. Without concrete details on the execution interface, error recovery mechanism, or how hallucinations are mitigated during continual refinement, it is difficult to assess reliability in the closed-loop setting.
[Training Pipeline] The multi-stage training pipeline (supervised initialization + agentic RL) is presented as enabling coordinated performance, yet no ablation isolating the contribution of each stage or the system-level reward formulation is referenced. This weakens the attribution of the reported robustness to the proposed architecture.

minor comments (2)

[Abstract] The abstract states that experiments demonstrate performance on 'unseen traffic scenarios, dynamics, and task configurations' but does not name the simulation platform, number of scenarios, or quantitative metrics; adding these would improve clarity.
[Methods] Notation for the shared dynamical system (state variables, transition functions) should be introduced consistently in the first methods subsection to aid readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. The comments identify key areas where additional clarification and evidence would strengthen the presentation of TrafficClaw. We address each major comment below and commit to a major revision that incorporates expanded implementation details, architectural descriptions, and new ablation studies.

read point-by-point responses

Referee: [Unified Runtime Environment] The central generalization claim depends on the unified runtime environment faithfully capturing coupled dynamics and constraint propagation across subsystems. The high-level description in the methods leaves unclear how shared mobility demand and spatiotemporal constraints are implemented without introducing inconsistencies (e.g., mismatched time scales or infeasible state transitions between traffic signals and transit).

Authors: We agree that the current description is insufficiently detailed to fully substantiate the generalization claims. In the revised manuscript we will expand Section 3.1 with a dedicated implementation subsection. Shared mobility demand will be described as generated by a single demand engine that samples from a joint spatiotemporal distribution derived from real urban datasets, ensuring identical demand realizations across all subsystems. Spatiotemporal constraints are enforced by a discrete-event simulator with a global clock and adaptive sub-stepping; each subsystem registers its update frequency while a central constraint validator rejects any transition that would violate network capacity or timing invariants before the state is committed. We will include pseudocode for the transition function, a data-flow diagram, and explicit discussion of how mismatched time scales are reconciled without introducing infeasible states. revision: yes
Referee: [LLM Agent Design] The LLM agent's executable spatiotemporal reasoning and reusable procedural memory are load-bearing for the system-aware performance. Without concrete details on the execution interface, error recovery mechanism, or how hallucinations are mitigated during continual refinement, it is difficult to assess reliability in the closed-loop setting.

Authors: We acknowledge that the reliability mechanisms require explicit exposition. The revised Section 4.2 will detail the execution interface as a tool-calling layer that translates LLM outputs into sandboxed Python calls against the simulator API. Error recovery uses a bounded retry loop (maximum three iterations) that feeds environment-generated error messages back to the agent; only verified successful executions are written to procedural memory. Hallucination mitigation combines (i) grounding every reasoning step against the current simulator state snapshot, (ii) a self-consistency check that requires the agent to emit an executable verification predicate before committing an action, and (iii) periodic memory pruning against observed outcomes. Example closed-loop traces will be added to the appendix. revision: yes
Referee: [Training Pipeline] The multi-stage training pipeline (supervised initialization + agentic RL) is presented as enabling coordinated performance, yet no ablation isolating the contribution of each stage or the system-level reward formulation is referenced. This weakens the attribution of the reported robustness to the proposed architecture.

Authors: The absence of component-wise ablations is a valid concern. We will add a new subsection (5.3) containing ablation experiments that isolate (a) supervised initialization alone, (b) agentic RL from random initialization, and (c) the full pipeline with and without the cross-subsystem coupling term in the reward. The system-level reward is defined as a weighted sum of per-subsystem metrics plus an explicit interaction penalty; we will report the exact weighting and penalty formulation. Results will be presented in a new table with statistical significance tests, allowing direct attribution of robustness gains to each stage. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's central claims rest on the construction of a unified runtime environment that integrates heterogeneous traffic subsystems, an LLM agent with spatiotemporal reasoning, and a multi-stage training pipeline (supervised initialization plus agentic RL). These are presented as novel engineering contributions whose value is assessed via experimental performance on held-out scenarios, dynamics, and task configurations. No equations, fitted parameters, or predictions are shown to reduce by construction to the same inputs; the abstract and framework description contain no self-definitional loops, no renaming of known results as new derivations, and no load-bearing self-citations that substitute for independent justification. The derivation chain is therefore self-contained as a proposed system whose generalization is tested externally rather than assumed.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the unproven premise that heterogeneous subsystems can be faithfully merged into one dynamical system and that LLM agents can execute reliable closed-loop control in that system; these are domain assumptions rather than derived results.

axioms (1)

domain assumption Heterogeneous traffic subsystems share infrastructure, mobility demand, and spatiotemporal constraints that can be modeled as a single dynamical system.
Explicitly stated as the argument for building a unified runtime environment.

invented entities (2)

TrafficClaw unified runtime environment no independent evidence
purpose: Integrate subsystems and enable cross-subsystem interaction modeling
Newly proposed framework component with no independent evidence supplied in the abstract.
LLM agent with executable spatiotemporal reasoning and reusable procedural memory no independent evidence
purpose: Perform unified diagnostics and continual strategy refinement
Newly introduced agent architecture whose capabilities are asserted rather than demonstrated outside the paper.

pith-pipeline@v0.9.0 · 5536 in / 1263 out tokens · 34543 ms · 2026-05-10T05:43:57.717272+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 13 canonical work pages · 5 internal anchors

[1]

Oluwaseun John Adeyemi, Ahmed A Arif, and Rajib Paul. 2021. Exploring the relationship of rush hour period and fatal and non-fatal crash injuries in the US: A systematic review and meta-analysis.Accident Analysis & Prevention163 (2021), 106462

2021
[2]

2026.Simulation of Urban Mobility (SUMO)

Pablo Alvarez Lopez, Angelo Banse, Michael Behrisch, Jakob Erdmann, Yun-Pang Flötteröd, Robert Hilbrich, Ronald Nippold, and Peter Wagner. 2026.Simulation of Urban Mobility (SUMO). doi:10.5281/zenodo.18406080

work page doi:10.5281/zenodo.18406080 2026
[3]

James E Anderson. 2011. The gravity model.Annu. Rev. Econ.3, 1 (2011), 133–160

2011
[4]

Francois Belletti, Daniel Haziza, Gabriel Gomes, and Alexandre M Bayen. 2017. Expert level control of ramp metering based on multi-task deep reinforcement learning.IEEE Transactions on Intelligent Transportation Systems19, 4 (2017), 1198–1207

2017
[5]

Nicolas Bougie and Narimawa Watanabe. 2025. Citysim: Modeling urban behav- iors and city dynamics with large-scale llm-driven agent simulation. InProceed- ings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track. 215–229

2025
[6]

Google DeepMind. 2026. Gemini. https://deepmind.google/models/gemini/ Accessed: 2026-02-09

2026
[7]

Yubo Dong, Xukun Zhu, Zhengzhe Pan, Linchao Zhu, and Yi Yang. 2024. Vil- lagerAgent: A Graph-Based Multi-Agent Framework for Coordinating Complex Task Dependencies in Minecraft. InFindings of the Association for Computational Linguistics ACL 2024. 16290–16314

2024
[8]

Runnan Fang, Shihao Cai, Baixuan Li, Jialong Wu, Guangyu Li, Wenbiao Yin, Xinyu Wang, Xiaobin Wang, Liangcai Su, Zhen Zhang, et al. 2025. Towards gen- eral agentic intelligence via environment scaling.arXiv preprint arXiv:2509.13311 (2025)

work page arXiv 2025
[9]

Nahid Parvez Farazi, Bo Zou, Tanvir Ahamed, and Limon Barua. 2021. Deep reinforcement learning in transportation research: A review.Transportation research interdisciplinary perspectives11 (2021), 100425

2021
[10]

Jie Feng, Yuwei Du, Jie Zhao, and Yong Li. 2025. Agentmove: A large language model based agentic framework for zero-shot next location prediction. InPro- ceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 1322–1338

2025
[11]

Romain Froger, Pierre Andrews, Matteo Bettini, Amar Budhiraja, Ricardo Silveira Cabral, Virginie Do, Emilien Garreau, Jean-Baptiste Gaya, Hugo Laurençon, Maxime Lecanu, et al. 2025. Are: Scaling up agent environments and evaluations. arXiv preprint arXiv:2509.17158(2025)

work page arXiv 2025
[12]

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al . 2025. DeepSeek-R1 in- centivizes reasoning in LLMs through reinforcement learning.Nature645, 8081 (2025), 633–638

2025
[13]

Mengkang Hu, Pu Zhao, Can Xu, Qingfeng Sun, Jian-Guang Lou, Qingwei Lin, Ping Luo, and Saravan Rajmohan. 2025. Agentgen: Enhancing planning abilities for large language model based agent via environment and task generation. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. 496–507

2025
[14]

Zhengyao Jiang, Dominik Schmidt, Dhruv Srikanth, Dixing Xu, Ian Kaplan, Deniss Jacenko, and Yuxiang Wu. 2025. Aide: Ai-driven exploration in the space of code.arXiv preprint arXiv:2502.13138(2025)

work page arXiv 2025
[15]

Jiarui Jin, Ming Zhou, Weinan Zhang, Minne Li, Zilong Guo, Zhiwei Qin, Yan Jiao, Xiaocheng Tang, Chenxi Wang, Jun Wang, et al. 2019. Coride: joint order dispatching and fleet management for multi-scale ride-hailing platforms. In Proceedings of the 28th ACM international conference on information and knowledge management. 1983–1992

2019
[16]

2008.Traffic signal timing manual

Peter Koonce et al. 2008.Traffic signal timing manual. Technical Report. United States. Federal Highway Administration

2008
[17]

Siqi Lai, Yansong Ning, Zirui Yuan, Zhixi Chen, and Hao Liu. 2026. USTBench: Benchmarking and Dissecting Spatiotemporal Reasoning Capabilities of LLMs as Urban Agents. InThe Fourteenth International Conference on Learning Repre- sentations

2026
[18]

Siqi Lai, Zhao Xu, Weijia Zhang, Hao Liu, and Hui Xiong. 2025. Llmlight: Large language models as traffic signal control agents. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. 2335–2346

2025
[19]

Wenbin Li, Di Yao, Ruibo Zhao, Wenjie Chen, Zijie Xu, Chengxue Luo, Chang Gong, Quanliang Jing, Haining Tan, and Jingping Bi. 2025. Stbench: Assessing the ability of large language models in spatio-temporal analysis. InCompanion Proceedings of the ACM on Web Conference 2025. 749–752

2025
[20]

Xiaoxi Li, Jiajie Jin, Guanting Dong, Hongjin Qian, Yongkang Wu, Ji-Rong Wen, Yutao Zhu, and Zhicheng Dou. 2025. WebThinker: Empowering Large Reasoning Models with Deep Research Capability. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

2025
[21]

Zhibin Li, Pan Liu, Chengcheng Xu, Hui Duan, and Wei Wang. 2017. Rein- forcement learning-based variable speed limit control strategy to reduce traffic congestion at freeway recurrent bottlenecks.IEEE transactions on intelligent transportation systems18, 11 (2017), 3204–3217

2017
[22]

William Liang, Sam Wang, Hung-Ju Wang, Osbert Bastani, Dinesh Jayaraman, and Yecheng Jason Ma. 2024. Environment curriculum generation via large language models. In8th Annual Conference on Robot Learning

2024
[23]

Aixin Liu, Aoxue Mei, Bangcai Lin, Bing Xue, Bingxuan Wang, Bingzheng Xu, Bochao Wu, Bowei Zhang, Chaofan Lin, Chen Dong, et al . 2025. Deepseek- v3.2: Pushing the frontier of open large language models.arXiv preprint arXiv:2512.02556(2025)

work page internal anchor Pith review arXiv 2025
[24]

Tianming Liu, Jirong Yang, and Yafeng Yin. 2025. Toward llm-agent-based mod- eling of transportation systems: A conceptual framework.Artificial Intelligence for Transportation1 (2025), 100001

2025
[25]

Metropolitan Transportation Authority (MTA). 2026. MTA Open Data Program and Transit Schedules. https://www.mta.info/open-data. Public transit schedules and operational transit data provided by local transportation authorities

2026
[26]

2025.MiniMax M2 & Agent: Ingenious in Simplicity

MiniMax AI. 2025.MiniMax M2 & Agent: Ingenious in Simplicity. https://www. minimax.io/news/minimax-m2 Accessed: 2026-01-08; MiniMax M2 is an open- source AI model optimized for coding and agentic workflows, offering efficient performance and high reasoning capability at low cost

2025
[27]

New York City Department of Transportation (NYC DOT). 2022. Citywide Mobility Survey. https://www.nyc.gov/html/dot/html/about/citywide-mobility- survey.shtml

2022
[28]

2025.Introducing OpenAI o3 and o4-mini

OpenAI. 2025.Introducing OpenAI o3 and o4-mini. https://openai.com/index/ introducing-o3-and-o4-mini/ Announcement of OpenAI’s o3 and o4-mini models with advanced reasoning and tool usage capabilities

2025
[29]

OpenStreetMap contributors. 2026. OpenStreetMap Planet Data. https://planet. openstreetmap.org/ Accessed: 2026-01-24

2026
[30]

Xianghe Pang, Shuo Tang, Rui Ye, Yuwen Du, Yaxin Du, and Siheng Chen. 2025. Browsemaster: Towards scalable web browsing via tool-augmented programmatic agent pair.arXiv preprint arXiv:2508.09129(2025)

work page arXiv 2025
[31]

Markos Papageorgiou, Habib Hadj-Salem, Jean-Marc Blosseville, et al . 1991. ALINEA: A local feedback control law for on-ramp metering.Transportation research record1320, 1 (1991), 58–67

1991
[32]

Jinghua Piao, Yuwei Yan, Jun Zhang, Nian Li, Junbo Yan, Xiaochong Lan, Zhihong Lu, Zhiheng Zheng, Jing Yi Wang, Di Zhou, et al. 2025. Agentsociety: Large-scale simulation of llm-driven generative agents advances understanding of human behaviors and society.arXiv preprint arXiv:2502.08691(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[33]

Josep Maria Salanova, Miquel Estrada, Georgia Aifadopoulou, and Evangelos Mitsakis. 2011. A review of the modeling of taxi services.Procedia-Social and Behavioral Sciences20 (2011), 150–161

2011
[34]

Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, AJ Ostrow, Akhila Ananthram, et al
[35]

Openai gpt-5 system card.arXiv preprint arXiv:2601.03267(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[36]

Silvia Siri, Cecilia Pasquale, Simona Sacone, and Antonella Ferrara. 2021. Freeway traffic control: A survey.Automatica130 (2021), 109655

2021
[37]

Peter Steinberger and OpenClaw Contributors. 2025. OpenClaw: Your Own Personal AI Assistant. Any OS. Any Platform. https://github.com/openclaw/ openclaw. Accessed: 2026-04-09

2025
[38]

SUMO Development Team. 2026. Taxi - SUMO Documentation. https://sumo.dlr. de/docs/Simulation/Taxi.html. Accessed: 2026-04-12

2026
[39]

Kimi Team, Tongtong Bai, Yifan Bai, Yiping Bao, SH Cai, Yuan Cao, Y Charles, HS Che, Cheng Chen, Guanduo Chen, et al . 2026. Kimi K2. 5: Visual Agentic Intelligence.arXiv preprint arXiv:2602.02276(2026)

work page internal anchor Pith review arXiv 2026
[40]

Jiawei Wang and Lijun Sun. 2022. Robust dynamic bus control: A distributional multi-agent reinforcement learning approach.IEEE Transactions on Intelligent Transportation Systems24, 4 (2022), 4075–4088

2022
[41]

Hua Wei, Guanjie Zheng, Vikash Gayah, and Zhenhui Li. 2019. A survey on traffic signal control methods.arXiv preprint arXiv:1904.08117(2019)

work page arXiv 2019
[42]

Zhiheng Xi, Yiwen Ding, Wenxiang Chen, Boyang Hong, Honglin Guo, Junzhe Wang, Xin Guo, Dingwen Yang, Chenyang Liao, Wei He, et al . 2025. Agent- gym: Evaluating and training large language model-based agents across diverse environments. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 27914–27961

2025
[43]

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025. Qwen3 technical report.arXiv preprint arXiv:2505.09388(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[44]

Chenglong Ye, Gang Xiong, Junyou Shang, Xingyuan Dai, Xiaoyan Gong, and Yisheng Lv. 2025. SUMO-MCP: Leveraging the model context protocol for au- tonomous traffic simulation and optimization.arXiv preprint arXiv:2506.03548 (2025)

work page arXiv 2025
[45]

Zirui Yuan, Siqi Lai, and Hao Liu. 2025. Collmlight: Cooperative large lan- guage model agents for network-wide traffic signal control.arXiv preprint arXiv:2503.11739(2025)

work page arXiv 2025
[46]

Siyao Zhang, Daocheng Fu, Wenzhe Liang, Zhao Zhang, Bin Yu, Pinlong Cai, and Baozhen Yao. 2024. Trafficgpt: Viewing, processing and interacting with traffic foundation models.Transport Policy150 (2024), 95–105

2024
[47]

Dong Zhao, Adriana-Simona Mihăiţă, Yuming Ou, Hanna Grzybowska, and Mo Li
[48]

=== Signal Timing Analysis ===

Origin–destination matrix estimation for public transport: A multi-modal weighted graph approach.Transportation Research Part C: Emerging Technologies 165 (2024), 104694. TrafficClaw: Generalizable Urban Traffic Control via Unified Physical Environment Modeling Conference acronym ’XX, Month DD–DD, 2026, City, Country A Observation Features and Interaction...

2024