Long-term Traffic Simulation via Structured Autoregressive Modeling
Pith reviewed 2026-07-01 05:53 UTC · model grok-4.3
The pith
Small frozen LLMs adapt to traffic simulation through motion-language token consistency, powering RosettaSim for stable long-horizon multi-agent modeling.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RosettaSim projects scene topology, agent states, and spawning intents into a structured autoregressive stream of variable length; small frozen LLMs then generate sustained multi-agent traffic behavior, reaching state-of-the-art results on both short- and long-term metrics of the Waymo Open Sim Agent Challenge. Retrieval-based Traffic Evaluation retrieves semantically similar real-world scenarios as context-aware anchors and achieves a higher correlation (r=0.83) with standard metrics than prior evaluation approaches (r=0.74).
What carries the argument
RosettaSim, the unified framework that converts dynamic traffic scenes into a single variable-length structured autoregressive token stream, leveraging LLM attention transfer and motion-natural-language distributional consistency.
If this is right
- RosettaSim reaches state-of-the-art accuracy on both short-term and long-term simulation tasks in the Waymo Open Sim Agent Challenge.
- Retrieval-based Traffic Evaluation supplies reference anchors that raise correlation with long-horizon fidelity from r=0.74 to r=0.83.
- Variable-length autoregressive streams naturally accommodate agents entering and exiting the scene.
- Heavily frozen small LLMs suffice for the adaptation once the scene is projected into the structured token stream.
Where Pith is reading between the lines
- If the token-consistency mechanism generalizes, the same projection technique could be tested on other continuous physical domains such as pedestrian crowds or robotic manipulation without domain-specific retraining.
- RTE-style retrieval anchors might be applied to other long-horizon simulation benchmarks to reduce reliance on fading one-to-one agent matching.
- Stable long-horizon traffic models produced this way could serve as drop-in world models inside closed-loop autonomous-driving planners.
Load-bearing premise
Distributional consistency between motion tokens and natural language tokens is sufficient for small frozen LLMs to adapt rapidly to traffic modeling without substantial fine-tuning or architectural changes.
What would settle it
A controlled experiment in which small frozen LLMs given the same structured stream but without the claimed motion-language consistency show no advantage over non-LLM autoregressive baselines on long-horizon WOSAC rollouts.
Figures
read the original abstract
Interactive traffic simulation is a vital world model for autonomous driving. A central challenge in long-horizon simulation is modeling sustained multi-agent interactions, which is further exacerbated by dynamic token cardinality as agents continuously enter and exit the scene. In this work, we propose that the solution lies in the synergy between the architectural inductive biases and statistical priors of large-scale sequence models, e.g., Large Language Models (LLMs). Our probing experiments reveal that the transferability of attention mechanisms and the distributional consistency between motion tokens and natural language enable small-scale, heavily frozen LLMs to rapidly adapt to traffic modeling. Building on this insight, we introduce RosettaSim, a unified framework that projects scene topology, agent states, and spawning intents into a structured autoregressive stream with variable length, achieving both strong short-term accuracy and stable long-horizon simulation fidelity. Furthermore, evaluating extended rollouts presents yet another hurdle, as one-to-one agent correspondence inevitably fades over time. To address this, we introduce Retrieval-based Traffic Evaluation (RTE), which retrieves semantically similar real-world scenarios as context-aware reference anchors. Experiments on the Waymo Open Sim Agent Challenge (WOSAC) demonstrate that RosettaSim achieves state-of-the-art performance in both short- and long-term simulation. Furthermore, RTE exhibits a stronger correlation with standard metrics ($r=0.83$) than existing approaches ($r=0.74$), indicating improved alignment with long-horizon simulation fidelity.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that attention transferability and distributional consistency between motion tokens and natural-language tokens allow small, heavily frozen LLMs to adapt rapidly to variable-cardinality traffic sequences. It introduces RosettaSim, which encodes scene topology, agent states, and spawning intents as a structured autoregressive stream, and reports SOTA results on both short- and long-horizon metrics of the Waymo Open Sim Agent Challenge. It further proposes Retrieval-based Traffic Evaluation (RTE) that retrieves semantically similar real-world scenarios as anchors and shows that RTE correlates more strongly (r=0.83) with standard metrics than prior approaches (r=0.74).
Significance. If the adaptation mechanism and the reported WOSAC numbers are substantiated, the work would provide evidence that pre-trained sequence models can be repurposed for sustained multi-agent simulation with limited architectural change, addressing a recognized bottleneck in long-horizon traffic world models. The RTE metric would also supply a concrete, reference-anchored alternative for evaluating rollouts where agent identity is lost.
major comments (2)
- [probing-experiments paragraph] Probing-experiments paragraph (abstract): the central claim that distributional consistency plus attention transferability suffices for rapid adaptation of heavily frozen small LLMs is load-bearing for the SOTA assertion, yet the supplied text contains no controls (freezing-ratio curves, from-scratch autoregressive baseline of identical size, or explicit frozen-parameter counts) that would confirm the adaptation occurs under the stated constraints.
- [RTE paragraph] Abstract, RTE paragraph: the reported correlation improvement (r=0.83 vs. r=0.74) is presented as evidence of better long-horizon fidelity, but without a description of how the retrieval anchors were selected or whether the correlation was computed on held-out data, it is impossible to rule out that the anchors were tuned on the same test distribution used for the WOSAC numbers.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and indicate the revisions planned for the next version of the manuscript.
read point-by-point responses
-
Referee: Probing-experiments paragraph (abstract): the central claim that distributional consistency plus attention transferability suffices for rapid adaptation of heavily frozen small LLMs is load-bearing for the SOTA assertion, yet the supplied text contains no controls (freezing-ratio curves, from-scratch autoregressive baseline of identical size, or explicit frozen-parameter counts) that would confirm the adaptation occurs under the stated constraints.
Authors: We agree that the current probing experiments lack the requested controls. In the revised manuscript we will add freezing-ratio curves, a from-scratch autoregressive baseline of identical size, and explicit frozen-parameter counts to directly substantiate that adaptation occurs under heavy freezing. revision: yes
-
Referee: Abstract, RTE paragraph: the reported correlation improvement (r=0.83 vs. r=0.74) is presented as evidence of better long-horizon fidelity, but without a description of how the retrieval anchors were selected or whether the correlation was computed on held-out data, it is impossible to rule out that the anchors were tuned on the same test distribution used for the WOSAC numbers.
Authors: We agree that the description of anchor selection and data partitioning is insufficient. We will revise the abstract and add explicit text stating that anchors are drawn from a held-out training subset and that the reported correlation is computed on the official test split, thereby eliminating any possibility of overlap with the WOSAC evaluation set. revision: yes
Circularity Check
No circularity: derivation chain remains self-contained
full rationale
The paper introduces RosettaSim as a projection of scene elements into a structured autoregressive stream and RTE as a retrieval-based metric, then reports empirical WOSAC results and an r=0.83 correlation. No equation, definition, or self-citation reduces a claimed prediction or uniqueness result to its own inputs by construction. The probing-experiment insight on attention transferability and token distributional consistency is presented as an empirical observation supporting the framework, not as a fitted parameter renamed as output. RTE correlation is reported as a measured property rather than a tautological consequence of its own anchors. The central claims therefore rest on external benchmark performance rather than internal redefinition.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
In: CVPR Workshop on Autonomous Driving (WAD) (2025)
Ahmadi, E., Schofield, H.: Rlftsim: Multi-agent traffic simulation via reinforcement learning fine-tuning. In: CVPR Workshop on Autonomous Driving (WAD) (2025)
2025
-
[2]
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Bjorck, J., Castañeda, F., Cherniadev, N., Da, X., Ding, R., Fan, L., Fang, Y., Fox, D., Hu, F., Huang, S., et al.: Gr00t n1: An open foundation model for generalist humanoid robots. arXiv preprint arXiv:2503.14734 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[3]
Advances in neural information processing systems33, 1877–1901 (2020)
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Nee- lakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. Advances in neural information processing systems33, 1877–1901 (2020)
1901
-
[4]
NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles
Caesar, H., Kabzan, J., Tan, K.S., Fong, W.K., Wolff, E.M., Lang, A.H., Fletcher, L., Beijbom, O., Omari, S.: nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles. arXiv preprint arXiv:2106.11810 (2021),https://arxiv. org/abs/2106.11810
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[5]
Advances in neural information processing systems35, 18878– 18891 (2022)
Chan, S., Santoro, A., Lampinen, A., Wang, J., Singh, A., Richemond, P., McClel- land, J., Hill, F.: Data distributional properties drive emergent in-context learning in transformers. Advances in neural information processing systems35, 18878– 18891 (2022)
2022
-
[6]
arXiv preprint arXiv:2510.18060 (2025)
Chang, W.J., Rangesh, A., Joseph, K., Strong, M., Tomizuka, M., Hu, Y., Zhan, W.: Spacer: Self-play anchoring with centralized reference models. arXiv preprint arXiv:2510.18060 (2025)
-
[7]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Chang, W.J., Zhan, W., Tomizuka, M., Chandraker, M., Pittaluga, F.: Langtraj: Diffusion model and dataset for language-conditioned trajectory simulation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 26622–26631 (2025)
2025
-
[8]
In: European Conference on Computer Vision
Chitta, K., Dauner, D., Geiger, A.: Sledge: Synthesizing driving environments with generative models and rule-based traffic. In: European Conference on Computer Vision. pp. 57–74. Springer (2024)
2024
-
[9]
In: International Conference on Machine Learning
Cusumano-Towner, M., Hafner, D., Hertzberg, A., Huval, B., Petrenko, A., Vinit- sky, E., Wijmans, E., Killian, T.W., Bowers, S., Sener, O., et al.: Robust auton- omy emerges from self-play. In: International Conference on Machine Learning. pp. 11710–11737. PMLR (2025)
2025
-
[10]
In: European Conference on Computer Vision
Ding, W., Cao, Y., Zhao, D., Xiao, C., Pavone, M.: Realgen: Retrieval augmented generation for controllable traffic scenarios. In: European Conference on Computer Vision. pp. 93–110. Springer (2024)
2024
-
[11]
arXiv preprint arXiv:2505.24808 (2025)
Ding, W., Veer, S., Chen, Y., Cao, Y., Xiao, C., Pavone, M.: Realdrive: Retrieval- augmented driving with diffusion models. arXiv preprint arXiv:2505.24808 (2025)
-
[12]
Advances in Neural Information Processing Systems35, 11763–11784 (2022)
Dinh, T., Zeng, Y., Zhang, R., Lin, Z., Gira, M., Rajput, S., Sohn, J.y., Pa- pailiopoulos, D., Lee, K.: Lift: Language-interfaced fine-tuning for non-language machine learning tasks. Advances in Neural Information Processing Systems35, 11763–11784 (2022)
2022
-
[13]
In: Conference on robot learning
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: Carla: An open urban driving simulator. In: Conference on robot learning. pp. 1–16. PMLR (2017)
2017
-
[14]
Xiao et al
Ettinger, S., Cheng, S., Caine, B., Liu, C., Zhao, H., Pradhan, S., Chai, Y., Sapp, B., Qi, C.R., Zhou, Y., et al.: Large scale interactive motion forecasting 16 L. Xiao et al. for autonomous driving: The waymo open motion dataset. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 9710–9719 (2021)
2021
-
[15]
In: 2023 IEEE International Conference on Robotics and Automation (ICRA)
Feng, L., Li, Q., Peng, Z., Tan, S., Zhou, B.: Trafficgen: Learning to generate diverse and realistic traffic scenarios. In: 2023 IEEE International Conference on Robotics and Automation (ICRA). pp. 3567–3575 (2023)
2023
-
[16]
Nature615(7953), 620–627 (2023)
Feng, S., Sun, H., Yan, X., Zhu, H., Zou, Z., Shen, S., Liu, H.X.: Dense reinforce- ment learning for safety validation of autonomous vehicles. Nature615(7953), 620–627 (2023)
2023
-
[17]
Nature Communications (2026)
Feng, S., Zhu, H., Sun, H., Yan, X., He, L., Yang, J., Su, G., Li, B., Li, S., Wang, L., et al.: Breaking through safety performance stagnation in autonomous vehicles with dense learning. Nature Communications (2026)
2026
-
[18]
In: CVPR Workshop on Simulation for Au- tonomous Driving (2026)
Feng, Z., Xiao, L., Yan, X.: Beyond binary metrics: Unveiling the safety illusion in autonomous driving simulation. In: CVPR Workshop on Simulation for Au- tonomous Driving (2026)
2026
-
[19]
Dept.ofLinguistics,BrownUniversity,Providence,R.I,revisedandamplified1979
Francis, W.N.: Brown corpus maunal: manual of information to accompany a stan- dard corpus of present-day edited American English for use with digital computers. Dept.ofLinguistics,BrownUniversity,Providence,R.I,revisedandamplified1979. edn. (1979)
1979
-
[20]
Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Let- man, A., Mathur, A., Schelten, A., Vaughan, A., et al.: The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[21]
Advances in Neural Information Process- ing Systems36(2024)
Gulino, C., Fu, J., Luo, W., Tucker, G., Bronstein, E., Lu, Y., Harb, J., Pan, X., Wang, Y., Chen, X., et al.: Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research. Advances in Neural Information Process- ing Systems36(2024)
2024
-
[22]
arXiv preprint arXiv:2510.06913 (2025)
Guo, K., Liu, H., Wu, X., Lv, C.: Decompgail: Learning realistic traffic behav- iors with decomposed multi-agent generative adversarial imitation learning. arXiv preprint arXiv:2510.06913 (2025)
-
[23]
In: International Conference on Learning Representations
Hendrycks,D.,Burns,C.,Basart,S.,Zou,A.,Mazeika,M.,Song,D.,Steinhardt,J.: Measuring massive multitask language understanding. In: International Conference on Learning Representations
-
[24]
In: European Conference on Computer Vision
Hu, Y., Chai, S., Yang, Z., Qian, J., Li, K., Shao, W., Zhang, H., Xu, W., Liu, Q.: Solving motion planning tasks with a scalable generative model. In: European Conference on Computer Vision. pp. 386–404. Springer (2024)
2024
-
[25]
Transactions on Machine Learning Research
Hwang, J.J., Xu, R., Lin, H., Hung, W.C., Ji, J., Choi, K., Huang, D., He, T., Cov- ington, P., Sapp, B., et al.: Emma: End-to-end multimodal model for autonomous driving. Transactions on Machine Learning Research
-
[26]
In: International Conference on Machine Learning
Janner, M., Du, Y., Tenenbaum, J., Levine, S.: Planning with diffusion for flexible behavior synthesis. In: International Conference on Machine Learning. pp. 9902–
-
[27]
Advances in Neural Information Processing Systems37, 55729–55760 (2024)
Jiang, C.M., Bai, Y., Cornman, A., Davis, C., Huang, X., Jeon, H., Kulshrestha, S., Lambert, J., Li, S., Zhou, X., et al.: Scenediffuser: Efficient and controllable driving simulation initialization and rollout. Advances in Neural Information Processing Systems37, 55729–55760 (2024)
2024
-
[28]
In: International Con- ference on Learning Representations
Kazemkhani, S., Pandya, A., Cornelisse, D., Shacklett, B., Vinitsky, E.: Gpudrive: Data-driven, multi-agent driving simulation at 1 million fps. In: International Con- ference on Learning Representations. vol. 2025, pp. 19320–19336 (2025)
2025
-
[29]
In: Conference on Robot Learning
Kim, M.J., Pertsch, K., Karamcheti, S., Xiao, T., Balakrishna, A., Nair, S., Rafailov, R., Foster, E.P., Sanketi, P.R., Vuong, Q., et al.: Openvla: An open-source vision-language-action model. In: Conference on Robot Learning. pp. 2679–2713. PMLR (2025) Long-term Traffic Simulation via Structured Autoregressive Modeling 17
2025
-
[30]
Auto-Encoding Variational Bayes
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[31]
LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs
Krojer, B., Nayak, S., Mañas, O., Adlakha, V., Elliott, D., Reddy, S., Mosbach, M.: Latentlens: Revealing highly interpretable visual tokens in llms. arXiv preprint arXiv:2602.00462 (2026)
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[32]
IEEE Transactions on Pattern Analysis and Machine Intelligence (2026)
Lin, L., Lin, X., Xu, K., Lu, H., Huang, L., Xiong, R., Wang, Y.: Unimm: A unified mixture model framework for multi-agent simulation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2026)
2026
-
[33]
Advances in neural information processing systems36, 34892–34916 (2023)
Liu, H., Li, C., Wu, Q., Lee, Y.J.: Visual instruction tuning. Advances in neural information processing systems36, 34892–34916 (2023)
2023
-
[34]
Authorea Preprints (2025)
Liu, H., Cao, Z., Yan, X., Feng, S., Lu, Q.: Autonomous vehicles: A critical review (2004-2024) and a vision for the future. Authorea Preprints (2025)
2004
-
[35]
In: 2024 IEEE International Conference on Robotics and Automation (ICRA)
Lu, J., Wong, K., Zhang, C., Suo, S., Urtasun, R.: Scenecontrol: Diffusion for controllable traffic scene generation. In: 2024 IEEE International Conference on Robotics and Automation (ICRA). pp. 16908–16914. IEEE (2024)
2024
-
[36]
In: Proceedings of the AAAI conference on artificial intelligence
Lu, K., Grover, A., Abbeel, P., Mordatch, I.: Frozen pretrained transformers as universal computation engines. In: Proceedings of the AAAI conference on artificial intelligence. vol. 36, pp. 7628–7636 (2022)
2022
-
[37]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Mi, L., Zhao, H., Nash, C., Jin, X., Gao, J., Sun, C., Schmid, C., Shavit, N., Chai, Y., Anguelov, D.: Hdmapgen: A hierarchical graph generative model of high definition maps. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4227–4236 (2021)
2021
-
[38]
In: Conference on Robot Learning
Mirchandani, S., Xia, F., Florence, P., Ichter, B., Driess, D., Arenas, M.G., Rao, K., Sadigh, D., Zeng, A.: Large language models as general pattern machines. In: Conference on Robot Learning. pp. 2498–2518. PMLR (2023)
2023
-
[39]
Advances in Neural Information Processing Systems36, 59151–59171 (2023)
Montali,N.,Lambert,J.,Mougin,P.,Kuefler,A.,Rhinehart,N.,Li,M.,Gulino,C., Emrich, T., Yang, Z., Whiteson, S., et al.: The waymo open sim agents challenge. Advances in Neural Information Processing Systems36, 59151–59171 (2023)
2023
-
[40]
In: The Twelfth International Conference on Learning Representations
Pang, Z., Xie, Z., Man, Y., Wang, Y.X.: Frozen transformers in language models are effective visual encoder layers. In: The Twelfth International Conference on Learning Representations
-
[41]
arXiv preprint arXiv:2509.23993 (2025)
Pei, M., Shi, S., Shen, S.: Advancing multi-agent traffic simulation via r1-style reinforcement fine-tuning. arXiv preprint arXiv:2509.23993 (2025)
-
[42]
In: The Fourteenth International Conference on Learning Representations (2026)
Peng, Z., Liu, Y., Zhou, B.: Scenestreamer: Continuous scenario generation as next token group prediction. In: The Fourteenth International Conference on Learning Representations (2026)
2026
-
[43]
In: The Twelfth International Conference on Learning Representations
Philion, J., Peng, X.B., Fidler, S.: Trajeglish: Traffic modeling as next-token pre- diction. In: The Twelfth International Conference on Learning Representations
-
[44]
arXiv preprint arXiv:2306.15914 (2023)
Qian, C., Xiu, D., Tian, M.: The 2nd place solution for 2023 waymo open sim agents challenge. arXiv preprint arXiv:2306.15914 (2023)
-
[45]
Renz, K., Chen, L., Arani, E., Sinavski, O.: Simlingo: Vision-only closed-loop au- tonomousdrivingwithlanguage-actionalignment.In:ProceedingsoftheComputer Vision and Pattern Recognition Conference. pp. 11993–12003 (2025)
2025
-
[46]
Rossert, C., Drever, J., Brostek, L.: combot: an ensemble combination model com- bining results from smart-tiny-clsft with a cognitive behavior mode. In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Workshop on Autonomous Driving (WAD) (2025),https:// storage.googleapis.com/waymo-uploads/files/resea...
2025
-
[47]
In: Conference on Robot Learning
Rowe, L., Girgis, R., Gosselin, A., Carrez, B., Golemo, F., Heide, F., Paull, L., Pal, C.: Ctrl-sim: Reactive and controllable driving agents with offline reinforcement learning. In: Conference on Robot Learning. pp. 3600–3621. PMLR (2025)
2025
-
[48]
In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion
Rowe, L., Girgis, R., Gosselin, A., Paull, L., Pal, C., Heide, F.: Scenario dreamer: Vectorized latent diffusion for generating driving simulation environments. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion. pp. 17207–17218 (2025)
2025
-
[49]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Seff,A.,Cera,B.,Chen,D.,Ng,M.,Zhou,A.,Nayakanti,N.,Refaat,K.S.,Al-Rfou, R., Sapp, B.: Motionlm: Multi-agent motion forecasting as language modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 8579–8590 (2023)
2023
-
[50]
In: International Conference on Ma- chine Learning
Shen, J., Li, L., Dery, L.M., Staten, C., Khodak, M., Neubig, G., Talwalkar, A.: Cross-modal fine-tuning: Align then refine. In: International Conference on Ma- chine Learning. pp. 31030–31056. PMLR (2023)
2023
-
[51]
Advances in Neural Information Pro- cessing Systems (2022)
Shi, S., Jiang, L., Dai, D., Schiele, B.: Motion transformer with global intention localization and local movement refinement. Advances in Neural Information Pro- cessing Systems (2022)
2022
-
[52]
IEEE Transactions on Pattern Analysis and Machine Intelligence46(5), 3955–3971 (2024)
Shi, S., Jiang, L., Dai, D., Schiele, B.: Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying. IEEE Transactions on Pattern Analysis and Machine Intelligence46(5), 3955–3971 (2024)
2024
-
[53]
IEEE Robotics and Automation Letters9(8), 7007–7014 (2024)
Sun, S., Gu, Z., Sun, T., Sun, J., Yuan, C., Han, Y., Li, D., Ang, M.H.: Drivescene- gen: Generating diverse and realistic driving scenarios from scratch. IEEE Robotics and Automation Letters9(8), 7007–7014 (2024)
2024
-
[54]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Suo, S., Regalado, S., Casas, S., Urtasun, R.: Trafficsim: Learning to simulate realistic multi-agent behaviors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10400–10409 (2021)
2021
-
[55]
In: 8th Annual Conference on Robot Learning (CoRL) (2024)
Tan, S., Ivanovic, B., Chen, Y., Li, B., Weng, X., Cao, Y., Krähenbühl, P., Pavone, M.: Promptable closed-loop traffic simulation. In: 8th Annual Conference on Robot Learning (CoRL) (2024)
2024
-
[56]
In: 7th Annual Conference on Robot Learning (CoRL) (2023),https://openreview.net/forum?id=PK2debCKaG
Tan, S., Ivanovic, B., Weng, X., Pavone, M., Kraehenbuehl, P.: Language condi- tioned traffic generation. In: 7th Annual Conference on Robot Learning (CoRL) (2023),https://openreview.net/forum?id=PK2debCKaG
2023
-
[57]
In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)
Tan, S., Lambert, J., Jeon, H., Kulshrestha, S., Bai, Y., Luo, J., Anguelov, D., Tan, M., Jiang, C.M.: Scenediffuser++: City-scale traffic simulation via a generative world model. In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR). pp. 1570–1580 (June 2025)
2025
-
[58]
In: IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR) (June 2021)
Tan, S., Wong, K., Wang, S., Manivasagam, S., Ren, M., Urtasun, R.: Scenegen: Learning to generate realistic traffic scenes. In: IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR) (June 2021)
2021
-
[59]
Llama 2: Open Foundation and Fine-Tuned Chat Models
Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bash- lykov, N., Batra, S., Bhargava, P., Bhosale, S., et al.: Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[60]
Advances in Neural Infor- mation Processing Systems34, 200–212 (2021)
Tsimpoukelli, M., Menick, J.L., Cabi, S., Eslami, S., Vinyals, O., Hill, F.: Multi- modal few-shot learning with frozen language models. Advances in Neural Infor- mation Processing Systems34, 200–212 (2021)
2021
-
[61]
In: Optimal transport: old and new, pp
Villani, C.: The wasserstein distances. In: Optimal transport: old and new, pp. 93–111. Springer (2009)
2009
-
[62]
In: Conference on Robot Learning
Wang, M., Wang, J., Ye, T., Yu, K.: Do llm modules generalize? a study on motion generation for autonomous driving. In: Conference on Robot Learning. pp. 4657–
-
[63]
PMLR (2025) Long-term Traffic Simulation via Structured Autoregressive Modeling 19
2025
-
[64]
Wang, S., Xu, J., Zhang, X., Hu, F., Huang, Z., Luo, J., Zhu, K., Zhu, J., Zhou, Y., Chen, Z.: Improving tokenization of agents and maps with transformers for multi-agentsimulation.In:ProceedingsoftheIEEE/CVFConferenceonComputer Vision and Pattern Recognition (CVPR) Workshops, Workshop on Autonomous Driving (WAD) (2025)
2025
-
[65]
arXiv preprint arXiv:2306.11868 (2023)
Wang, Y., Zhao, T., Yi, F.: Multiverse transformer: 1st place solution for waymo open sim agents challenge 2023. arXiv preprint arXiv:2306.11868 (2023)
-
[66]
In: Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C
Wu, W., Feng, X., Gao, Z., KAN, Y.: Smart: Scalable multi-agent real-time mo- tion generation via next-token prediction. In: Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C. (eds.) Advances in Neural In- formation Processing Systems. vol. 37, pp. 114048–114071. Curran Associates, Inc. (2024),https://proceedings.neurips....
2024
-
[67]
In: 2025 IEEE International Conference on Robotics and Automation (ICRA)
Xiao, L., Liu, J.J., Yang, S., Li, X., Ye, X., Yang, W., Wang, J.: Learning multiple probabilistic decisions from latent world model in autonomous driving. In: 2025 IEEE International Conference on Robotics and Automation (ICRA). pp. 1279–
2025
-
[68]
arXiv preprint arXiv:2408.16375 (2024)
Xiao,L.,Liu,J.J.,Ye,X.,Yang,W.,Wang,J.:Easychauffeur:Abaselineadvancing simplicity and efficiency on waymax. arXiv preprint arXiv:2408.16375 (2024)
-
[69]
IEEE Robotics and Automation Letters9(10), 8186–8193 (2024)
Xu, Z., Zhang, Y., Xie, E., Zhao, Z., Guo, Y., Wong, K.Y.K., Li, Z., Zhao, H.: Drivegpt4: Interpretable end-to-end autonomous driving via large language model. IEEE Robotics and Automation Letters9(10), 8186–8193 (2024)
2024
-
[70]
Yan, X., Feng, S., Sun, H., Liu, H.X.: Distributionally consistent simulation of naturalistic driving environment for autonomous vehicle testing. IEEE Trans- actions on Intelligent Transportation Systems26(7), 9187–9200 (2025).https: //doi.org/10.1109/TITS.2025.3571966
-
[71]
Nature communications14(1), 2037 (2023)
Yan, X., Zou, Z., Feng, S., Zhu, H., Sun, H., Liu, H.X.: Learning naturalistic driving environment with statistical realism. Nature communications14(1), 2037 (2023)
2037
-
[72]
Yang, A., Li, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Gao, C., Huang, C., Lv, C., et al.: Qwen3 technical report. arXiv preprint arXiv:2505.09388 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[73]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Yang, X., Tan, S., Krähenbühl, P.: Long-term traffic simulation with interleaved autoregressive motion and scenario generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 25305–25314 (2025)
2025
-
[74]
Zhang, P., Zeng, G., Wang, T., Lu, W.: Tinyllama: An open-source small language model (2024)
2024
-
[75]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 586–595 (2018)
2018
-
[76]
In: Proceedings of theIEEEConferenceonComputerVisionandPatternRecognition(CVPR)(2025)
Zhang, Z., Karkus, P., Igl, M., Ding, W., Chen, Y., Ivanovic, B., Pavone, M.: Closed-loop supervised fine-tuning of tokenized traffic models. In: Proceedings of theIEEEConferenceonComputerVisionandPatternRecognition(CVPR)(2025)
2025
-
[77]
Zhang, Z., Jia, X., Chen, G., Li, Q., Wu, Z., Jiang, Y.G., Yan, J.: Trajtok: What makes for a good trajectory tokenizer in behavior generation? In: The Fourteenth International Conference on Learning Representations
-
[78]
IEEE Robotics and Automation Letters10(2), 1082–1089 (2024)
Zhao, J., Zhuang, J., Zhou, Q., Ban, T., Xu, Z., Zhou, H., Wang, J., Wang, G., Li, Z., Li, B.: Kigras: Kinematic-driven generative model for realistic agent simulation. IEEE Robotics and Automation Letters10(2), 1082–1089 (2024)
2024
-
[79]
In: 2023 IEEE International Conference on Robotics and Automation (ICRA)
Zhong, Z., Rempe, D., Xu, D., Chen, Y., Veer, S., Che, T., Ray, B., Pavone, M.: Guided conditional diffusion for controllable traffic simulation. In: 2023 IEEE International Conference on Robotics and Automation (ICRA). pp. 3560–3566. IEEE (2023) 20 L. Xiao et al
2023
-
[80]
Advances in Neural Information Processing Systems37, 79597– 79617 (2024)
Zhou, Z., Hu, H., Chen, X., Wang, J., Guan, N., Wu, K., Li, Y.H., Huang, Y.K., Xue, C.J.: Behaviorgpt: Smart agent simulation for autonomous driving with next- patch prediction. Advances in Neural Information Processing Systems37, 79597– 79617 (2024)
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.