Controllable Sim Agents with Behavior Latents

Juanwu Lu; Junyu Zhu; Ziran Wang

arxiv: 2607.02496 · v1 · pith:2GFZQLDWnew · submitted 2026-07-02 · 💻 cs.RO · cs.LG

Controllable Sim Agents with Behavior Latents

Juanwu Lu , Junyu Zhu , Ziran Wang This is my paper

Pith reviewed 2026-07-03 10:38 UTC · model grok-4.3

classification 💻 cs.RO cs.LG

keywords controllable simulation agentsbehavior latentsvariational inferencerectified flowtraffic simulationWaymo Open Motion Datasetsoft eligibility gatesclassifier-free guidance

0 comments

The pith

CNeVA infers a per-agent Gaussian behavior latent from channel-specific returns to steer simulated traffic agents along independent axes such as speed and safety.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework that makes traffic simulation agents both realistic imitators of logged data and steerable along separate behavioral dimensions. It derives a low-dimensional Gaussian latent for each agent directly from per-channel discounted returns using a closed-form variational update, then feeds this latent into a rectified-flow trajectory generator trained with masked curricula. Soft eligibility gates replace hard reward thresholds to keep gradients flowing for agents near decision boundaries. Experiments on the Waymo Open Motion Dataset show the model matches the realism of top imitation baselines while adding controllable axes for speed, acceleration, safety, and map compliance that prior models lack. The central demonstration is that these controls remain monotone and free of obvious reward-hacking artifacts when physical guardrails are applied.

Core claim

CNeVA learns to infer a per-agent Gaussian behavior latent from per-channel discounted returns via a closed-form conjugate variational update, conditioning a rectified-flow trajectory generator trained on a mixed channel-mask curriculum for classifier-free guidance. Soft eligibility gates replace hard binary thresholds with smooth exponential decay to preserve gradient signals. On the Waymo Open Motion Dataset the model reaches competitive realism while exposing per-channel controllability; speed- and acceleration-based steering yields monotone responses without stall-induced reward hacking, safety controllability is monotone and substantial, and map compliance becomes steerable under a cont

What carries the argument

The per-agent Gaussian behavior latent inferred from per-channel discounted returns via closed-form conjugate variational update, which conditions the rectified-flow trajectory generator under classifier-free guidance.

If this is right

The model attains competitive realism on the Waymo Open Motion Dataset benchmark.
Speed- and acceleration-based steering produces monotone responses without stall-induced reward hacking.
Safety controllability is monotone and substantial once soft eligibility gates are introduced.
Map compliance becomes steerable under a context-residual return measure.
Steering metrics must be interpreted together with physical-plausibility guardrails to avoid reward-hacking confounds.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same latent could be used to generate targeted edge-case scenarios for testing autonomous vehicle planners at scale.
The per-channel structure might transfer to other multi-agent domains such as pedestrian or drone simulation with minimal retraining.
Combining the latent with external map or weather inputs could produce controllable variations beyond the original training distribution.
The closed-form variational update may reduce the data volume needed to achieve controllable behavior compared with standard reinforcement-learning approaches.

Load-bearing premise

Per-channel discounted returns contain enough independent information to support a low-dimensional Gaussian latent whose dimensions can be varied independently without inducing correlations or reward-hacking artifacts in the generated trajectories.

What would settle it

A controlled experiment in which varying one latent dimension produces non-monotone trajectory changes or visible stall artifacts when steering speed or acceleration on held-out Waymo scenes.

Figures

Figures reproduced from arXiv: 2607.02496 by Juanwu Lu, Junyu Zhu, Ziran Wang.

**Figure 2.** Figure 2: CNeVA infers a per-agent behavior latent [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: CNeVA qualitative rollouts at ρ= 1, w= 1.5. Fidelity. CNeVA achieves a null-path minADE of 1.113 ± 0.011 m and an offroad rate of 32.5 ± 0.2% at 200K steps on the WOMD validation split and minADE = 1.80 m on the WOMD testing split. The off-road rate climbs over the rollout to roughly 1.3 to 1.9× the logged-data rate by 8 s (logged ≈ 17%), reflecting open-loop drift rather than a static gap. Under the ident… view at source ↗

**Figure 4.** Figure 4: Reward-hacking contrast. Left: speed CSM is inflated 6× in the early checkpoint relative to the main model. Right: physical plausibility guardrails reveal that the early ablation achieves its CSM by stalling (76% stall, 61% of GT speed), while the main model retains 95% of GT speed. (clearance > 5 m, TTC > 6 s) receive no safety label, so the generator sees a near-zero safety signal. Replacing the threshol… view at source ↗

**Figure 5.** Figure 5: Simplified factor graph corresponding to the relaxed joint equation 3: the latent state [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: Empirical histogram of the per-channel discounted return [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: CSM diagonal ∆Rk for CNeVA versus the hard-eligibility ablation at ρ= 1, w= 1.5 (openloop, context-residual return). Left: semantic channels (safety, map). Right: kinematic channels (speed, accel). Error bars show ±1 std over 5 seeds. F.3 Multi-Seed CSM [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗

**Figure 8.** Figure 8: Map CSM diagonal (∆Rmap) across three return measures, evaluated on the 200K softeligibility model at ρ= 1, w= 1.5 (open-loop). Map steering is controllable only under the contextresidual definition (+0.61); the physical-offroad (−0.12) and lane-centerline (≈ 0) measures show no measurable physical-space response. References Prafulla Dhariwal and Alex Nichol. Diffusion models beat GANs on image synthesis… view at source ↗

read the original abstract

Realistic traffic simulation requires agents that imitate logged behavior and can also be steered along interpretable axes. Such controllability enables engineers to isolate variables, reproduce specific edge cases, and test autonomous systems without real-world risk. We introduce Controllable Neural Variational Agents (CNeVA), a controllable simulated-agent framework that learns to infer a per-agent Gaussian behavior latent from per-channel discounted returns via a closed-form conjugate variational update, conditioning a rectified-flow trajectory generator trained on a mixed channel-mask curriculum for classifier-free guidance. To tackle scarcity in reward signals, we propose soft eligibility gates that replace hard binary thresholds with smooth exponential decay, preserving the gradient signal for near-threshold agents. On the Waymo Open Motion Dataset, CNeVA attains competitive realism on the benchmark while exposing per-channel controllability that the higher-ranked imitation models lack. Speed- and acceleration-based steering produces monotone responses without stall-induced reward hacking. Safety controllability is monotone and substantial with the introduction of soft eligibility. We manage to achieve steerable map compliance under a context-residual return measure. Furthermore, our experiment demonstrates that steering metrics must be read alongside physical-plausibility guardrails to avoid reward-hacking confounds.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CNeVA adds conjugate variational latents and soft gates for per-channel steering in traffic sim, but the abstract supplies no numbers to check if the controllability actually works.

read the letter

The main thing here is a method called CNeVA that infers low-dimensional Gaussian behavior latents from per-channel discounted returns using a closed-form conjugate variational update, then feeds them into a rectified-flow generator trained with mixed channel masks and classifier-free guidance. Soft eligibility gates replace hard thresholds with exponential decay to keep gradients alive near boundaries. The goal is monotone steering on axes like speed, acceleration, and safety without the usual reward-hacking stalls.

The combination of the conjugate update, the soft gates, and the channel-masked curriculum looks like the actual new piece; prior imitation models on Waymo do not expose this kind of per-channel control. The practical target—letting simulation engineers isolate variables for AV edge-case testing—is a real bottleneck, and the authors correctly flag that steering metrics need physical-plausibility guardrails.

The soft spots are straightforward. The abstract claims competitive realism and monotone controllability on the Waymo Open Motion Dataset, yet shows no tables, no baseline scores, and no ablation numbers. Without those, it is impossible to tell whether the claimed independence across channels holds or whether the soft gates actually prevent hacking. The stress-test point about correlated returns is worth taking seriously; speed and acceleration often move together in real trajectories, so the variational posterior could easily entangle dimensions even with the curriculum. The paper would need covariance checks or explicit independence tests to close that gap.

This is for people building or using traffic simulators for autonomous-vehicle validation. A reader who already works with rectified flow or variational trajectory models might pick up the specific synthesis, but only if the full paper supplies the missing quantitative evidence. It is worth sending to peer review so the authors can add the experiments and address the correlation concern; the core idea is grounded enough to merit referee time even if heavy revision is likely.

Referee Report

2 major / 2 minor

Summary. The paper introduces Controllable Neural Variational Agents (CNeVA), which infers a per-agent low-dimensional Gaussian behavior latent from per-channel discounted returns via a closed-form conjugate variational update. This latent conditions a rectified-flow trajectory generator trained on a mixed channel-mask curriculum with classifier-free guidance. Soft eligibility gates replace hard thresholds with exponential decay to preserve gradients near reward thresholds. On the Waymo Open Motion Dataset the method reports competitive realism together with per-channel controllability (speed/acceleration, safety, map compliance) that higher-ranked imitation baselines lack, claiming monotone steering responses without stall-induced reward hacking.

Significance. If the reported per-channel controllability is shown to arise from genuinely independent latent dimensions, the framework would provide a practical route to interpretable, steerable traffic simulation for AV testing. The closed-form conjugate update and soft eligibility construction are efficient and gradient-friendly contributions that could be adopted more broadly.

major comments (2)

[Abstract and §4] Abstract and §4 (results): the central claim of independent per-channel controllability rests on the assumption that per-channel discounted returns supply sufficiently independent information to support a disentangled Gaussian latent. No correlation analysis, covariance matrix of the returns, or ablation isolating latent independence is reported. If returns are correlated—as is typical when speed and acceleration co-vary in driving trajectories—the conjugate variational posterior will entangle dimensions, so that classifier-free guidance on one channel affects others despite the curriculum and gates.
[Abstract] Abstract: competitive realism and controllability are asserted without any quantitative tables, baseline numbers, or ablation details. This prevents assessment of whether the controllability metrics are statistically distinguishable from higher-ranked imitation models or whether the soft-eligibility improvement is load-bearing.

minor comments (2)

Clarify the precise functional form of the soft eligibility decay (e.g., the decay-rate hyper-parameter) and its interaction with the rectified-flow training objective.
[§4] Add explicit physical-plausibility guardrails (as mentioned in the final sentence of the abstract) to all reported steering-metric figures so that readers can verify absence of reward-hacking confounds.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the independence of the behavior latents and the need for quantitative support in the abstract. We address each point below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (results): the central claim of independent per-channel controllability rests on the assumption that per-channel discounted returns supply sufficiently independent information to support a disentangled Gaussian latent. No correlation analysis, covariance matrix of the returns, or ablation isolating latent independence is reported. If returns are correlated—as is typical when speed and acceleration co-vary in driving trajectories—the conjugate variational posterior will entangle dimensions, so that classifier-free guidance on one channel affects others despite the curriculum and gates.

Authors: We agree that explicit verification of latent independence strengthens the central claim. The per-channel returns are processed via a closed-form conjugate variational update, and the mixed channel-mask curriculum plus classifier-free guidance are designed to encourage disentanglement, but we did not report a covariance matrix of the returns or a dedicated ablation on dimension independence. In the revision we will add (i) the empirical covariance matrix of the per-channel discounted returns on the Waymo training set and (ii) an ablation that measures cross-channel interference when steering is applied to a single latent dimension while holding others fixed. These additions will either confirm sufficient independence or quantify the residual entanglement. revision: yes
Referee: [Abstract] Abstract: competitive realism and controllability are asserted without any quantitative tables, baseline numbers, or ablation details. This prevents assessment of whether the controllability metrics are statistically distinguishable from higher-ranked imitation models or whether the soft-eligibility improvement is load-bearing.

Authors: The detailed quantitative results, including realism metrics, baseline comparisons, and ablation studies on soft eligibility, appear in §4 and the supplementary tables. To make the abstract self-contained we will revise it to include the key numerical values (e.g., realism scores relative to the top imitation baselines and the controllability deltas with/without soft eligibility) while remaining within length limits. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper's method infers Gaussian behavior latents from per-channel discounted returns via closed-form conjugate variational update, then conditions a rectified-flow trajectory generator using mixed channel-mask curriculum and classifier-free guidance, with soft eligibility gates added for reward signals. These are standard, externally validated components (variational inference, rectified flows, CFG) whose correctness does not depend on the Waymo Open Motion Dataset results or reduce the reported realism/controllability metrics to fitted inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing premises, and the evaluation uses external benchmarks without the controllability metrics being direct functions of the same fitted parameters.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The framework rests on standard variational inference assumptions and flow-matching objectives; no new physical entities are postulated. The soft eligibility gate introduces one tunable decay rate whose value is chosen to preserve gradient signal.

free parameters (1)

soft eligibility decay rate
Exponential decay constant that replaces hard binary thresholds; its value affects which agents contribute gradients near reward thresholds.

axioms (2)

domain assumption Behavior can be summarized by a low-dimensional Gaussian latent whose dimensions align with reward channels
Invoked when the per-channel returns are mapped to the latent via conjugate update.
domain assumption Rectified-flow trajectory generator trained with classifier-free guidance can be conditioned on the latent without mode collapse
Required for the controllability claims.

pith-pipeline@v0.9.1-grok · 5735 in / 1416 out tokens · 27355 ms · 2026-07-03T10:38:02.829533+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

157 extracted references · 13 canonical work pages · 1 internal anchor

[1]

Weath- erdepth: Curriculum contrastive learning for self-supervised depth estimation under adverse weather conditions

Yulong Cao, Boris Ivanovic, Chaowei Xiao, and Marco Pavone. Reinforcement learning with human feedback for realistic traffic simulation. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pages 14428--14434, May 2024. doi:10.1109/ICRA57147.2024.10610878

work page doi:10.1109/icra57147.2024.10610878 2024
[2]

Editing driver character: Socially-controllable behavior generation for interactive traffic simulation

Wei-Jer Chang, Chen Tang, Chenran Li, Yeping Hu, Masayoshi Tomizuka, and Wei Zhan. Editing driver character: Socially-controllable behavior generation for interactive traffic simulation. IEEE Robotics and Automation Letters, 8 0 (9): 0 5432--5439, September 2023. doi:10.1109/LRA.2023.3291897

work page doi:10.1109/lra.2023.3291897 2023
[3]

SAFE-SIM : Safety-critical closed-loop traffic simulation with diffusion-controllable adversaries

Wei-Jer Chang, Francesco Pittaluga, Masayoshi Tomizuka, Wei Zhan, and Manmohan Chandraker. SAFE-SIM : Safety-critical closed-loop traffic simulation with diffusion-controllable adversaries. In European Conference on Computer Vision (ECCV), 2024

2024
[4]

SPACeR : Self-play anchoring with centralized reference models

Wei-Jer Chang, Akshay Rangesh, Kevin Joseph, Matthew Strong, Masayoshi Tomizuka, Yihan Hu, and Wei Zhan. SPACeR : Self-play anchoring with centralized reference models. In International Conference on Learning Representations (ICLR), 2026

2026
[5]

Human-compatible driving partners through data-regularized self-play reinforcement learning, 2024

Daphne Cornelisse and Eugene Vinitsky. Human-compatible driving partners through data-regularized self-play reinforcement learning, 2024

2024
[6]

Robust autonomy emerges from self-play

Marco Cusumano-Towner, David Hafner, Alexander Hertzberg, Brody Huval, Aleksei Petrenko, Eugene Vinitsky, Erik Wijmans, Taylor Killian, Stuart Bowers, Ozan Sener, Philipp Kraehenbuehl, and Vladlen Koltun. Robust autonomy emerges from self-play. In Proceedings of the 42nd International Conference on Machine Learning, volume 267 of PMLR, pages 11710--11737, 2025

2025
[7]

Large scale interactive motion forecasting for autonomous driving: The W aymo open motion dataset

Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Benjamin Sapp, Charles R Qi, Yin Zhou, et al. Large scale interactive motion forecasting for autonomous driving: The W aymo open motion dataset. In IEEE/CVF International Conference on Computer Vision (ICCV), 2021

2021
[8]

Classifier-free diffusion guidance, 2022

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance, 2022

2022
[9]

Solving motion planning tasks with a scalable generative model

Yihan Hu, Siqi Chai, Zhening Yang, Jingyu Qian, Kun Li, Wenxin Shao, Haichao Zhang, Wei Xu, and Qiang Liu. Solving motion planning tasks with a scalable generative model. In Ale s Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, and G \"u l Varol, editors, Computer Vision -- ECCV 2024, pages 386--404, Cham, 2025. Springer Nature Switzerland

2024
[10]

Versatile behavior diffusion for generalized traffic agent simulation

Zhiyu Huang, Zixu Zhang, Ameya Vaidya, Yuxiao Chen, Jaime Fernández Fisac, and Chen Lv. Versatile behavior diffusion for generalized traffic agent simulation. IEEE Transactions on Intelligent Transportation Systems, pages 1--17, 2026

2026
[11]

MotionDiffuser : Controllable multi-agent motion prediction using diffusion

Chiyu Max Jiang, Andre Cornman, Cheolho Park, Benjamin Sapp, Yin Zhou, and Dragomir Anguelov. MotionDiffuser : Controllable multi-agent motion prediction using diffusion. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

2023
[12]

Scenediffuser: Efficient and controllable driving simulation initialization and rollout

Chiyu Max Jiang, Yijing Bai, Andre Cornman, Christopher Davis, Xiukun Huang, Hong Jeon, Sakshum Kulshrestha, John Lambert, Shuangyu Li, Xuanyu Zhou, Carlos Fuertes, Chang Yuan, Mingxing Tan, Yin Zhou, and Dragomir Anguelov. Scenediffuser: Efficient and controllable driving simulation initialization and rollout. In Advances in Neural Information Processing...

2024
[13]

Learning in graphical models

Michael Irwin Jordan. Learning in graphical models. MIT press, 1999

1999
[14]

Flow matching for generative modeling

Yaron Lipman, Ricky T Q Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. In International Conference on Learning Representations (ICLR), 2023

2023
[15]

Stage: Style-controllable action generation for personalized autonomous driving

Zihao Liu, Xing Liu, Yizhai Zhang, and Panfeng Huang. Stage: Style-controllable action generation for personalized autonomous driving. IEEE Robotics and Automation Letters, 11 0 (2): 0 2130--2137, February 2026. doi:10.1109/LRA.2025.3640974

work page doi:10.1109/lra.2025.3640974 2026
[16]

The W aymo open sim agents challenge

Nico Montali, John Lambert, Paul Mougin, Alex Kuefler, Nicholas Rhinehart, Michelle Li, Cole Gulino, Tristan Emrich, Zoey Yang, Shimon Whiteson, et al. The W aymo open sim agents challenge. In Advances in Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2023

2023
[17]

Wayformer: Motion forecasting via simple & efficient attention networks

Nigamaa Nayakanti, Rami Al-Rfou, Aurick Zhou, Kratarth Goel, Khaled S Refaat, and Benjamin Sapp. Wayformer: Motion forecasting via simple & efficient attention networks. In IEEE International Conference on Robotics and Automation (ICRA), 2023

2023
[18]

Scene transformer: A unified architecture for predicting multiple agent trajectories, 2022

Jiquan Ngiam, Benjamin Caine, Vijay Vasudevan, Zhengdong Zhang, Hao-Tien Lewis Chiang, Jeffrey Ling, Rebecca Roelofs, Alex Bewley, Chenxi Liu, Ashish Venugopal, David Weiss, Ben Sapp, Zhifeng Chen, and Jonathon Shlens. Scene transformer: A unified architecture for predicting multiple agent trajectories, 2022

2022
[19]

Advancing multi-agent traffic simulation via r1-style reinforcement fine-tuning, 2026

Muleilan Pei, Shaoshuai Shi, and Shaojie Shen. Advancing multi-agent traffic simulation via r1-style reinforcement fine-tuning, 2026

2026
[20]

Trajeglish : Traffic modeling as next-token prediction

Jonah Philion, Xue Bin Peng, and Sanja Fidler. Trajeglish : Traffic modeling as next-token prediction. In International Conference on Learning Representations (ICLR), 2024

2024
[21]

Scenario diffusion: Controllable driving scenario generation with diffusion

Ethan Pronovost, Meghana Reddy Ganesina, Noureldin Hendy, Zeyu Wang, Andres Morales, Kai Wang, and Nick Roy. Scenario diffusion: Controllable driving scenario generation with diffusion. In Advances in Neural Information Processing Systems (NeurIPS), 2023

2023
[22]

Guibas, Sanja Fidler, and Or Litany

Davis Rempe, Jonah Philion, Leonidas J. Guibas, Sanja Fidler, and Or Litany. Generating useful accident-prone driving scenarios via a learned traffic prior. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

2022
[23]

CtRL-Sim : Reactive and controllable driving agents with offline reinforcement learning

Luke Rowe, Roger Girgis, Anthony Gosselin, Bruno Carrez, Florian Golemo, Felix Heide, Liam Paull, and Christopher Pal. CtRL-Sim : Reactive and controllable driving agents with offline reinforcement learning. In Conference on Robot Learning (CoRL), 2024

2024
[24]

Scenario dreamer: Vectorized latent diffusion for generating driving simulation environments

Luke Rowe, Roger Girgis, Anthony Gosselin, Liam Paull, Christopher Pal, and Felix Heide. Scenario dreamer: Vectorized latent diffusion for generating driving simulation environments. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

2025
[25]

MotionLM : Multi-agent motion forecasting as language modeling

Ari Seff, Brian Cera, Dian Chen, Mason Ng, Aurick Zhou, Nigamaa Nayakanti, Khaled S Refaat, Rami Al-Rfou, and Benjamin Sapp. MotionLM : Multi-agent motion forecasting as language modeling. In IEEE/CVF International Conference on Computer Vision (ICCV), 2023

2023
[26]

Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying

Shaoshuai Shi, Li Jiang, Dengxin Dai, and Bernt Schiele. Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46 0 (5): 0 3955--3971, May 2024. doi:10.1109/TPAMI.2024.3352811

work page doi:10.1109/tpami.2024.3352811 2024
[27]

TrafficSim : Learning to simulate realistic multi-agent behaviors

Simon Suo, Sebastian Regalado, Sergio Casas, and Raquel Urtasun. TrafficSim : Learning to simulate realistic multi-agent behaviors. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

2021
[28]

SceneDiffuser++ : City-scale traffic simulation via a generative world model

Shuhan Tan, John Lambert, Hong Jeon, Sakshum Kulshrestha, Yijing Bai, Jing Luo, Dragomir Anguelov, Mingxing Tan, and Chiyu Max Jiang. SceneDiffuser++ : City-scale traffic simulation via a generative world model. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025 a

2025
[29]

Flow matching-based autonomous driving planning with advanced interactive behavior modeling

Tianyi Tan, Yinan Zheng, Ruiming Liang, Zexu Wang, Kexin ZHENG, Jinliang Zheng, Jianxiong Li, Xianyuan Zhan, and Jingjing Liu. Flow matching-based autonomous driving planning with advanced interactive behavior modeling. In Advances in Neural Information Processing Systems (NeurIPS), 2025 b

2025
[30]

Smart: Scalable multi-agent real-time motion generation via next-token prediction

Wei Wu, Xiaoxin Feng, Ziyan Gao, and Yuheng Kan. Smart: Scalable multi-agent real-time motion generation via next-token prediction. In Advances in Neural Information Processing Systems, volume 37, pages 114048--114071, 2024

2024
[31]

GoalFlow : Goal-driven flow matching for multimodal trajectories generation in end-to-end autonomous driving

Zebin Xing, Xingyu Zhang, Yang Hu, Bo Jiang, Tong He, Qian Zhang, Xiaoxiao Long, and Wei Yin. GoalFlow : Goal-driven flow matching for multimodal trajectories generation in end-to-end autonomous driving. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

2025
[32]

Diverse critical interaction generation for planning and planner evaluation

Zhao-Heng Yin, Lingfeng Sun, Liting Sun, Masayoshi Tomizuka, and Wei Zhan. Diverse critical interaction generation for planning and planner evaluation. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 7036--7043, September 2021. doi:10.1109/IROS51168.2021.9636266

work page doi:10.1109/iros51168.2021.9636266 2021
[33]

Trajgen: Generating realistic and diverse trajectories with reactive and feasible agent behaviors for autonomous driving

Qichao Zhang, Yinfeng Gao, Yikang Zhang, Youtian Guo, Dawei Ding, Yunpeng Wang, Peng Sun, and Dongbin Zhao. Trajgen: Generating realistic and diverse trajectories with reactive and feasible agent behaviors for autonomous driving. IEEE Transactions on Intelligent Transportation Systems, 23 0 (12): 0 24474--24487, December 2022. ISSN 1558-0016. doi:10.1109/...

work page doi:10.1109/tits.2022.3202185 2022
[34]

TrafficBots : Towards world models for autonomous driving simulation and motion prediction

Zhejun Zhang, Alexander Liniger, Dengxin Dai, Fisher Yu, and Luc Van Gool. TrafficBots : Towards world models for autonomous driving simulation and motion prediction. In IEEE International Conference on Robotics and Automation (ICRA), 2023

2023
[35]

Closed-loop supervised fine-tuning of tokenized traffic models

Zhejun Zhang, Peter Karkus, Maximilian Igl, Wenhao Ding, Yuxiao Chen, Boris Ivanovic, and Marco Pavone. Closed-loop supervised fine-tuning of tokenized traffic models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025 a

2025
[36]

TrajTok : Technical report for 2025 Waymo open sim agents challenge

Zhiyuan Zhang, Xiaosong Jia, Guanyu Chen, Qifeng Li, and Junchi Yan. TrajTok : Technical report for 2025 Waymo open sim agents challenge. Technical report, Shanghai Jiao Tong University, 2025 b

2025
[37]

Language-guided traffic simulation via scene-level diffusion

Ziyuan Zhong, Davis Rempe, Yuxiao Chen, Boris Ivanovic, Yulong Cao, Danfei Xu, Marco Pavone, and Baishakhi Ray. Language-guided traffic simulation via scene-level diffusion. In Conference on Robot Learning (CoRL), 2023 a

2023
[38]

Guided conditional diffusion for controllable traffic simulation

Ziyuan Zhong, Davis Rempe, Danfei Xu, Yuxiao Chen, Sushant Veer, Tong Che, Baishakhi Ray, and Marco Pavone. Guided conditional diffusion for controllable traffic simulation. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 3560--3566, 2023 b

2023
[39]

BehaviorGPT : Smart agent simulation for autonomous driving with next-patch prediction

Zikang Zhou, Haibo Hu, Xinhong Chen, Jianping Wang, Nan Guan, Kui Wu, Yung-Hui Li, Yu-Kai Huang, and Chun Jason Xue. BehaviorGPT : Smart agent simulation for autonomous driving with next-patch prediction. In Advances in Neural Information Processing Systems (NeurIPS), 2024

2024
[40]

Chang, Wei-Jer and Rangesh, Akshay and Joseph, Kevin and Strong, Matthew and Tomizuka, Masayoshi and Hu, Yihan and Zhan, Wei , booktitle =
[41]

HIQL: Offline Goal-Conditioned RL with Latent States as Actions , volume =

Park, Seohong and Ghosh, Dibya and Eysenbach, Benjamin and Levine, Sergey , booktitle =. HIQL: Offline Goal-Conditioned RL with Latent States as Actions , volume =
[42]

Offline Reinforcement Learning with Implicit

Kostrikov, Ilya and Nair, Ashvin and Levine, Sergey , booktitle =. Offline Reinforcement Learning with Implicit
[43]

Conservative

Kumar, Aviral and Zhou, Aurick and Tucker, George and Levine, Sergey , booktitle =. Conservative
[44]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Decision Transformer: Reinforcement Learning via Sequence Modeling , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[45]

Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , author =. arXiv preprint arXiv:1910.00177 , year =

work page internal anchor Pith review Pith/arXiv arXiv 1910
[46]

Nair, Ashvin and Gupta, Abhishek and Dalal, Murtaza and Levine, Sergey , journal =
[47]

Advances in Neural Information Processing Systems (NeurIPS) , year =

A Minimalist Approach to Offline Reinforcement Learning , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[48]

International Conference on Learning Representations (ICLR) , year =

Learning to Reach Goals via Iterated Supervised Learning , author =. International Conference on Learning Representations (ICLR) , year =
[49]

Emmons, Scott and Eysenbach, Benjamin and Kostrikov, Ilya and Levine, Sergey , booktitle =
[50]

Rethinking Goal-Conditioned Supervised Learning and its Connection to Offline

Yang, Rui and Lu, Yiming and Li, Wenhao and Sun, Hao and Fang, Meng and Du, Yali and Li, Xiu and Han, Lei and Zhang, Chongjie , booktitle =. Rethinking Goal-Conditioned Supervised Learning and its Connection to Offline
[51]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Contrastive Learning as Goal-Conditioned Reinforcement Learning , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[52]

International Conference on Machine Learning (ICML) , year =

Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning , author =. International Conference on Machine Learning (ICML) , year =
[53]

Offline Goal-Conditioned Reinforcement Learning via

Ma, Yecheng Jason and Yan, Jason and Jayaraman, Dinesh and Bastani, Osbert , booktitle =. Offline Goal-Conditioned Reinforcement Learning via
[54]

Park, Seohong and Frans, Kevin and Eysenbach, Benjamin and Levine, Sergey , booktitle =
[55]

Foundation Policies with

Park, Seohong and Kreiman, Tobias and Levine, Sergey , booktitle =. Foundation Policies with
[56]

Rowe, Luke and Girgis, Roger and Gosselin, Anthony and Carrez, Bruno and Golemo, Florian and Heide, Felix and Paull, Liam and Pal, Christopher , booktitle =
[57]

Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments , booktitle =
[58]

Ajay, Anurag and Kumar, Aviral and Agrawal, Pulkit and Levine, Sergey and Nachum, Ofir , booktitle =
[59]

Vezhnevets, Alexander Sasha and Osindero, Simon and Schaul, Tom and Heess, Nicolas and Jaderberg, Max and Silver, David and Kavukcuoglu, Koray , booktitle =
[60]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Data-Efficient Hierarchical Reinforcement Learning , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[61]

International Conference on Learning Representations (ICLR) , year =

Learning Multi-Level Hierarchies with Hindsight , author =. International Conference on Learning Representations (ICLR) , year =
[62]

AAAI Conference on Artificial Intelligence (AAAI) , year =

The Option-Critic Architecture , author =. AAAI Conference on Artificial Intelligence (AAAI) , year =
[63]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Deep Hierarchical Planning from Pixels , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[64]

Conference on Robot Learning (CoRL) , year =

Relay Policy Learning: Solving Long Horizon Tasks via Imitation and Reinforcement Learning , author =. Conference on Robot Learning (CoRL) , year =
[65]

arXiv preprint arXiv:2506.18847 , year =

Projective Quasimetric Planning , author =. arXiv preprint arXiv:2506.18847 , year =

work page arXiv
[66]

arXiv preprint arXiv:2505.14975 , year =

Flattening Hierarchies with Policy Bootstrapping , author =. arXiv preprint arXiv:2505.14975 , year =

work page arXiv
[67]

Montali, Nico and Lambert, John and Mougin, Paul and Kuefler, Alex and Rhinehart, Nicholas and Li, Michelle and Gulino, Cole and Emrich, Tristan and Yang, Zoey and Whiteson, Shimon and others , booktitle =. The
[68]

Large Scale Interactive Motion Forecasting for Autonomous Driving: The

Ettinger, Scott and Cheng, Shuyang and Caine, Benjamin and Liu, Chenxi and Zhao, Hang and Pradhan, Sabeek and Chai, Yuning and Sapp, Benjamin and Qi, Charles R and Zhou, Yin and others , booktitle =. Large Scale Interactive Motion Forecasting for Autonomous Driving: The
[69]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Motion Transformer with Global Intention Localization and Local Movement Refinement , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[70]

Seff, Ari and Cera, Brian and Chen, Dian and Ng, Mason and Zhou, Aurick and Nayakanti, Nigamaa and Refaat, Khaled S and Al-Rfou, Rami and Sapp, Benjamin , booktitle =
[71]

Philion, Jonah and Peng, Xue Bin and Fidler, Sanja , booktitle =
[72]

SMART: Scalable Multi-agent Real-time Motion Generation via Next-token Prediction , volume =

Wu, Wei and Feng, Xiaoxin and Gao, Ziyan and Kan, Yuheng , booktitle =. SMART: Scalable Multi-agent Real-time Motion Generation via Next-token Prediction , volume =
[73]

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models , author =. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =
[74]

Zhou, Zikang and Hu, Haibo and Chen, Xinhong and Wang, Jianping and Guan, Nan and Wu, Kui and Li, Yung-Hui and Huang, Yu-Kai and Xue, Chun Jason , booktitle =
[75]

KiGRAS: Kinematic-Driven Generative Model for Realistic Agent Simulation , year =

Zhao, Jianbo and Zhuang, Jiaheng and Zhou, Qibin and Ban, Taiyu and Xu, Ziyao and Zhou, Hangning and Wang, Junhe and Wang, Guoan and Li, Zhiheng and Li, Bin , journal =. KiGRAS: Kinematic-Driven Generative Model for Realistic Agent Simulation , year =
[76]

International Conference on Learning Representations (ICLR) , year =

High-Dimensional Continuous Control Using Generalized Advantage Estimation , author =. International Conference on Learning Representations (ICLR) , year =
[77]

Versatile Behavior Diffusion for Generalized Traffic Agent Simulation , year =

Huang, Zhiyu and Zhang, Zixu and Vaidya, Ameya and Chen, Yuxiao and Fernández Fisac, Jaime and Lv, Chen , journal =. Versatile Behavior Diffusion for Generalized Traffic Agent Simulation , year =
[78]

Gulino, Cole and Fu, Justin and Luo, Wenjie and Tucker, George and Bronstein, Eli and Lu, Yiren and Harb, Jean and Pan, Xinlei and Wang, Yan and Chen, Xiangyu and others , booktitle =
[79]

Kazemkhani, Saman and Pandya, Aarav and Cornelisse, Daphne and Shacklett, Brennan and Vinitsky, Eugene , booktitle =
[80]

SceneDiffuser: Efficient and Controllable Driving Simulation Initialization and Rollout , volume =

Jiang, Chiyu Max and Bai, Yijing and Cornman, Andre and Davis, Christopher and Huang, Xiukun and Jeon, Hong and Kulshrestha, Sakshum and Lambert, John and Li, Shuangyu and Zhou, Xuanyu and Fuertes, Carlos and Yuan, Chang and Tan, Mingxing and Zhou, Yin and Anguelov, Dragomir , booktitle =. SceneDiffuser: Efficient and Controllable Driving Simulation Initi...

Showing first 80 references.

[1] [1]

Weath- erdepth: Curriculum contrastive learning for self-supervised depth estimation under adverse weather conditions

Yulong Cao, Boris Ivanovic, Chaowei Xiao, and Marco Pavone. Reinforcement learning with human feedback for realistic traffic simulation. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pages 14428--14434, May 2024. doi:10.1109/ICRA57147.2024.10610878

work page doi:10.1109/icra57147.2024.10610878 2024

[2] [2]

Editing driver character: Socially-controllable behavior generation for interactive traffic simulation

Wei-Jer Chang, Chen Tang, Chenran Li, Yeping Hu, Masayoshi Tomizuka, and Wei Zhan. Editing driver character: Socially-controllable behavior generation for interactive traffic simulation. IEEE Robotics and Automation Letters, 8 0 (9): 0 5432--5439, September 2023. doi:10.1109/LRA.2023.3291897

work page doi:10.1109/lra.2023.3291897 2023

[3] [3]

SAFE-SIM : Safety-critical closed-loop traffic simulation with diffusion-controllable adversaries

Wei-Jer Chang, Francesco Pittaluga, Masayoshi Tomizuka, Wei Zhan, and Manmohan Chandraker. SAFE-SIM : Safety-critical closed-loop traffic simulation with diffusion-controllable adversaries. In European Conference on Computer Vision (ECCV), 2024

2024

[4] [4]

SPACeR : Self-play anchoring with centralized reference models

Wei-Jer Chang, Akshay Rangesh, Kevin Joseph, Matthew Strong, Masayoshi Tomizuka, Yihan Hu, and Wei Zhan. SPACeR : Self-play anchoring with centralized reference models. In International Conference on Learning Representations (ICLR), 2026

2026

[5] [5]

Human-compatible driving partners through data-regularized self-play reinforcement learning, 2024

Daphne Cornelisse and Eugene Vinitsky. Human-compatible driving partners through data-regularized self-play reinforcement learning, 2024

2024

[6] [6]

Robust autonomy emerges from self-play

Marco Cusumano-Towner, David Hafner, Alexander Hertzberg, Brody Huval, Aleksei Petrenko, Eugene Vinitsky, Erik Wijmans, Taylor Killian, Stuart Bowers, Ozan Sener, Philipp Kraehenbuehl, and Vladlen Koltun. Robust autonomy emerges from self-play. In Proceedings of the 42nd International Conference on Machine Learning, volume 267 of PMLR, pages 11710--11737, 2025

2025

[7] [7]

Large scale interactive motion forecasting for autonomous driving: The W aymo open motion dataset

Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Benjamin Sapp, Charles R Qi, Yin Zhou, et al. Large scale interactive motion forecasting for autonomous driving: The W aymo open motion dataset. In IEEE/CVF International Conference on Computer Vision (ICCV), 2021

2021

[8] [8]

Classifier-free diffusion guidance, 2022

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance, 2022

2022

[9] [9]

Solving motion planning tasks with a scalable generative model

Yihan Hu, Siqi Chai, Zhening Yang, Jingyu Qian, Kun Li, Wenxin Shao, Haichao Zhang, Wei Xu, and Qiang Liu. Solving motion planning tasks with a scalable generative model. In Ale s Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, and G \"u l Varol, editors, Computer Vision -- ECCV 2024, pages 386--404, Cham, 2025. Springer Nature Switzerland

2024

[10] [10]

Versatile behavior diffusion for generalized traffic agent simulation

Zhiyu Huang, Zixu Zhang, Ameya Vaidya, Yuxiao Chen, Jaime Fernández Fisac, and Chen Lv. Versatile behavior diffusion for generalized traffic agent simulation. IEEE Transactions on Intelligent Transportation Systems, pages 1--17, 2026

2026

[11] [11]

MotionDiffuser : Controllable multi-agent motion prediction using diffusion

Chiyu Max Jiang, Andre Cornman, Cheolho Park, Benjamin Sapp, Yin Zhou, and Dragomir Anguelov. MotionDiffuser : Controllable multi-agent motion prediction using diffusion. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

2023

[12] [12]

Scenediffuser: Efficient and controllable driving simulation initialization and rollout

Chiyu Max Jiang, Yijing Bai, Andre Cornman, Christopher Davis, Xiukun Huang, Hong Jeon, Sakshum Kulshrestha, John Lambert, Shuangyu Li, Xuanyu Zhou, Carlos Fuertes, Chang Yuan, Mingxing Tan, Yin Zhou, and Dragomir Anguelov. Scenediffuser: Efficient and controllable driving simulation initialization and rollout. In Advances in Neural Information Processing...

2024

[13] [13]

Learning in graphical models

Michael Irwin Jordan. Learning in graphical models. MIT press, 1999

1999

[14] [14]

Flow matching for generative modeling

Yaron Lipman, Ricky T Q Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. In International Conference on Learning Representations (ICLR), 2023

2023

[15] [15]

Stage: Style-controllable action generation for personalized autonomous driving

Zihao Liu, Xing Liu, Yizhai Zhang, and Panfeng Huang. Stage: Style-controllable action generation for personalized autonomous driving. IEEE Robotics and Automation Letters, 11 0 (2): 0 2130--2137, February 2026. doi:10.1109/LRA.2025.3640974

work page doi:10.1109/lra.2025.3640974 2026

[16] [16]

The W aymo open sim agents challenge

Nico Montali, John Lambert, Paul Mougin, Alex Kuefler, Nicholas Rhinehart, Michelle Li, Cole Gulino, Tristan Emrich, Zoey Yang, Shimon Whiteson, et al. The W aymo open sim agents challenge. In Advances in Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2023

2023

[17] [17]

Wayformer: Motion forecasting via simple & efficient attention networks

Nigamaa Nayakanti, Rami Al-Rfou, Aurick Zhou, Kratarth Goel, Khaled S Refaat, and Benjamin Sapp. Wayformer: Motion forecasting via simple & efficient attention networks. In IEEE International Conference on Robotics and Automation (ICRA), 2023

2023

[18] [18]

Scene transformer: A unified architecture for predicting multiple agent trajectories, 2022

Jiquan Ngiam, Benjamin Caine, Vijay Vasudevan, Zhengdong Zhang, Hao-Tien Lewis Chiang, Jeffrey Ling, Rebecca Roelofs, Alex Bewley, Chenxi Liu, Ashish Venugopal, David Weiss, Ben Sapp, Zhifeng Chen, and Jonathon Shlens. Scene transformer: A unified architecture for predicting multiple agent trajectories, 2022

2022

[19] [19]

Advancing multi-agent traffic simulation via r1-style reinforcement fine-tuning, 2026

Muleilan Pei, Shaoshuai Shi, and Shaojie Shen. Advancing multi-agent traffic simulation via r1-style reinforcement fine-tuning, 2026

2026

[20] [20]

Trajeglish : Traffic modeling as next-token prediction

Jonah Philion, Xue Bin Peng, and Sanja Fidler. Trajeglish : Traffic modeling as next-token prediction. In International Conference on Learning Representations (ICLR), 2024

2024

[21] [21]

Scenario diffusion: Controllable driving scenario generation with diffusion

Ethan Pronovost, Meghana Reddy Ganesina, Noureldin Hendy, Zeyu Wang, Andres Morales, Kai Wang, and Nick Roy. Scenario diffusion: Controllable driving scenario generation with diffusion. In Advances in Neural Information Processing Systems (NeurIPS), 2023

2023

[22] [22]

Guibas, Sanja Fidler, and Or Litany

Davis Rempe, Jonah Philion, Leonidas J. Guibas, Sanja Fidler, and Or Litany. Generating useful accident-prone driving scenarios via a learned traffic prior. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

2022

[23] [23]

CtRL-Sim : Reactive and controllable driving agents with offline reinforcement learning

Luke Rowe, Roger Girgis, Anthony Gosselin, Bruno Carrez, Florian Golemo, Felix Heide, Liam Paull, and Christopher Pal. CtRL-Sim : Reactive and controllable driving agents with offline reinforcement learning. In Conference on Robot Learning (CoRL), 2024

2024

[24] [24]

Scenario dreamer: Vectorized latent diffusion for generating driving simulation environments

Luke Rowe, Roger Girgis, Anthony Gosselin, Liam Paull, Christopher Pal, and Felix Heide. Scenario dreamer: Vectorized latent diffusion for generating driving simulation environments. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

2025

[25] [25]

MotionLM : Multi-agent motion forecasting as language modeling

Ari Seff, Brian Cera, Dian Chen, Mason Ng, Aurick Zhou, Nigamaa Nayakanti, Khaled S Refaat, Rami Al-Rfou, and Benjamin Sapp. MotionLM : Multi-agent motion forecasting as language modeling. In IEEE/CVF International Conference on Computer Vision (ICCV), 2023

2023

[26] [26]

Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying

Shaoshuai Shi, Li Jiang, Dengxin Dai, and Bernt Schiele. Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46 0 (5): 0 3955--3971, May 2024. doi:10.1109/TPAMI.2024.3352811

work page doi:10.1109/tpami.2024.3352811 2024

[27] [27]

TrafficSim : Learning to simulate realistic multi-agent behaviors

Simon Suo, Sebastian Regalado, Sergio Casas, and Raquel Urtasun. TrafficSim : Learning to simulate realistic multi-agent behaviors. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

2021

[28] [28]

SceneDiffuser++ : City-scale traffic simulation via a generative world model

Shuhan Tan, John Lambert, Hong Jeon, Sakshum Kulshrestha, Yijing Bai, Jing Luo, Dragomir Anguelov, Mingxing Tan, and Chiyu Max Jiang. SceneDiffuser++ : City-scale traffic simulation via a generative world model. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025 a

2025

[29] [29]

Flow matching-based autonomous driving planning with advanced interactive behavior modeling

Tianyi Tan, Yinan Zheng, Ruiming Liang, Zexu Wang, Kexin ZHENG, Jinliang Zheng, Jianxiong Li, Xianyuan Zhan, and Jingjing Liu. Flow matching-based autonomous driving planning with advanced interactive behavior modeling. In Advances in Neural Information Processing Systems (NeurIPS), 2025 b

2025

[30] [30]

Smart: Scalable multi-agent real-time motion generation via next-token prediction

Wei Wu, Xiaoxin Feng, Ziyan Gao, and Yuheng Kan. Smart: Scalable multi-agent real-time motion generation via next-token prediction. In Advances in Neural Information Processing Systems, volume 37, pages 114048--114071, 2024

2024

[31] [31]

GoalFlow : Goal-driven flow matching for multimodal trajectories generation in end-to-end autonomous driving

Zebin Xing, Xingyu Zhang, Yang Hu, Bo Jiang, Tong He, Qian Zhang, Xiaoxiao Long, and Wei Yin. GoalFlow : Goal-driven flow matching for multimodal trajectories generation in end-to-end autonomous driving. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

2025

[32] [32]

Diverse critical interaction generation for planning and planner evaluation

Zhao-Heng Yin, Lingfeng Sun, Liting Sun, Masayoshi Tomizuka, and Wei Zhan. Diverse critical interaction generation for planning and planner evaluation. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 7036--7043, September 2021. doi:10.1109/IROS51168.2021.9636266

work page doi:10.1109/iros51168.2021.9636266 2021

[33] [33]

Trajgen: Generating realistic and diverse trajectories with reactive and feasible agent behaviors for autonomous driving

Qichao Zhang, Yinfeng Gao, Yikang Zhang, Youtian Guo, Dawei Ding, Yunpeng Wang, Peng Sun, and Dongbin Zhao. Trajgen: Generating realistic and diverse trajectories with reactive and feasible agent behaviors for autonomous driving. IEEE Transactions on Intelligent Transportation Systems, 23 0 (12): 0 24474--24487, December 2022. ISSN 1558-0016. doi:10.1109/...

work page doi:10.1109/tits.2022.3202185 2022

[34] [34]

TrafficBots : Towards world models for autonomous driving simulation and motion prediction

Zhejun Zhang, Alexander Liniger, Dengxin Dai, Fisher Yu, and Luc Van Gool. TrafficBots : Towards world models for autonomous driving simulation and motion prediction. In IEEE International Conference on Robotics and Automation (ICRA), 2023

2023

[35] [35]

Closed-loop supervised fine-tuning of tokenized traffic models

Zhejun Zhang, Peter Karkus, Maximilian Igl, Wenhao Ding, Yuxiao Chen, Boris Ivanovic, and Marco Pavone. Closed-loop supervised fine-tuning of tokenized traffic models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025 a

2025

[36] [36]

TrajTok : Technical report for 2025 Waymo open sim agents challenge

Zhiyuan Zhang, Xiaosong Jia, Guanyu Chen, Qifeng Li, and Junchi Yan. TrajTok : Technical report for 2025 Waymo open sim agents challenge. Technical report, Shanghai Jiao Tong University, 2025 b

2025

[37] [37]

Language-guided traffic simulation via scene-level diffusion

Ziyuan Zhong, Davis Rempe, Yuxiao Chen, Boris Ivanovic, Yulong Cao, Danfei Xu, Marco Pavone, and Baishakhi Ray. Language-guided traffic simulation via scene-level diffusion. In Conference on Robot Learning (CoRL), 2023 a

2023

[38] [38]

Guided conditional diffusion for controllable traffic simulation

Ziyuan Zhong, Davis Rempe, Danfei Xu, Yuxiao Chen, Sushant Veer, Tong Che, Baishakhi Ray, and Marco Pavone. Guided conditional diffusion for controllable traffic simulation. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 3560--3566, 2023 b

2023

[39] [39]

BehaviorGPT : Smart agent simulation for autonomous driving with next-patch prediction

Zikang Zhou, Haibo Hu, Xinhong Chen, Jianping Wang, Nan Guan, Kui Wu, Yung-Hui Li, Yu-Kai Huang, and Chun Jason Xue. BehaviorGPT : Smart agent simulation for autonomous driving with next-patch prediction. In Advances in Neural Information Processing Systems (NeurIPS), 2024

2024

[40] [40]

Chang, Wei-Jer and Rangesh, Akshay and Joseph, Kevin and Strong, Matthew and Tomizuka, Masayoshi and Hu, Yihan and Zhan, Wei , booktitle =

[41] [41]

HIQL: Offline Goal-Conditioned RL with Latent States as Actions , volume =

Park, Seohong and Ghosh, Dibya and Eysenbach, Benjamin and Levine, Sergey , booktitle =. HIQL: Offline Goal-Conditioned RL with Latent States as Actions , volume =

[42] [42]

Offline Reinforcement Learning with Implicit

Kostrikov, Ilya and Nair, Ashvin and Levine, Sergey , booktitle =. Offline Reinforcement Learning with Implicit

[43] [43]

Conservative

Kumar, Aviral and Zhou, Aurick and Tucker, George and Levine, Sergey , booktitle =. Conservative

[44] [44]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Decision Transformer: Reinforcement Learning via Sequence Modeling , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

[45] [45]

Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , author =. arXiv preprint arXiv:1910.00177 , year =

work page internal anchor Pith review Pith/arXiv arXiv 1910

[46] [46]

Nair, Ashvin and Gupta, Abhishek and Dalal, Murtaza and Levine, Sergey , journal =

[47] [47]

Advances in Neural Information Processing Systems (NeurIPS) , year =

A Minimalist Approach to Offline Reinforcement Learning , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

[48] [48]

International Conference on Learning Representations (ICLR) , year =

Learning to Reach Goals via Iterated Supervised Learning , author =. International Conference on Learning Representations (ICLR) , year =

[49] [49]

Emmons, Scott and Eysenbach, Benjamin and Kostrikov, Ilya and Levine, Sergey , booktitle =

[50] [50]

Rethinking Goal-Conditioned Supervised Learning and its Connection to Offline

Yang, Rui and Lu, Yiming and Li, Wenhao and Sun, Hao and Fang, Meng and Du, Yali and Li, Xiu and Han, Lei and Zhang, Chongjie , booktitle =. Rethinking Goal-Conditioned Supervised Learning and its Connection to Offline

[51] [51]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Contrastive Learning as Goal-Conditioned Reinforcement Learning , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

[52] [52]

International Conference on Machine Learning (ICML) , year =

Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning , author =. International Conference on Machine Learning (ICML) , year =

[53] [53]

Offline Goal-Conditioned Reinforcement Learning via

Ma, Yecheng Jason and Yan, Jason and Jayaraman, Dinesh and Bastani, Osbert , booktitle =. Offline Goal-Conditioned Reinforcement Learning via

[54] [54]

Park, Seohong and Frans, Kevin and Eysenbach, Benjamin and Levine, Sergey , booktitle =

[55] [55]

Foundation Policies with

Park, Seohong and Kreiman, Tobias and Levine, Sergey , booktitle =. Foundation Policies with

[56] [56]

Rowe, Luke and Girgis, Roger and Gosselin, Anthony and Carrez, Bruno and Golemo, Florian and Heide, Felix and Paull, Liam and Pal, Christopher , booktitle =

[57] [57]

Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments , booktitle =

[58] [58]

Ajay, Anurag and Kumar, Aviral and Agrawal, Pulkit and Levine, Sergey and Nachum, Ofir , booktitle =

[59] [59]

Vezhnevets, Alexander Sasha and Osindero, Simon and Schaul, Tom and Heess, Nicolas and Jaderberg, Max and Silver, David and Kavukcuoglu, Koray , booktitle =

[60] [60]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Data-Efficient Hierarchical Reinforcement Learning , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

[61] [61]

International Conference on Learning Representations (ICLR) , year =

Learning Multi-Level Hierarchies with Hindsight , author =. International Conference on Learning Representations (ICLR) , year =

[62] [62]

AAAI Conference on Artificial Intelligence (AAAI) , year =

The Option-Critic Architecture , author =. AAAI Conference on Artificial Intelligence (AAAI) , year =

[63] [63]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Deep Hierarchical Planning from Pixels , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

[64] [64]

Conference on Robot Learning (CoRL) , year =

Relay Policy Learning: Solving Long Horizon Tasks via Imitation and Reinforcement Learning , author =. Conference on Robot Learning (CoRL) , year =

[65] [65]

arXiv preprint arXiv:2506.18847 , year =

Projective Quasimetric Planning , author =. arXiv preprint arXiv:2506.18847 , year =

work page arXiv

[66] [66]

arXiv preprint arXiv:2505.14975 , year =

Flattening Hierarchies with Policy Bootstrapping , author =. arXiv preprint arXiv:2505.14975 , year =

work page arXiv

[67] [67]

Montali, Nico and Lambert, John and Mougin, Paul and Kuefler, Alex and Rhinehart, Nicholas and Li, Michelle and Gulino, Cole and Emrich, Tristan and Yang, Zoey and Whiteson, Shimon and others , booktitle =. The

[68] [68]

Large Scale Interactive Motion Forecasting for Autonomous Driving: The

Ettinger, Scott and Cheng, Shuyang and Caine, Benjamin and Liu, Chenxi and Zhao, Hang and Pradhan, Sabeek and Chai, Yuning and Sapp, Benjamin and Qi, Charles R and Zhou, Yin and others , booktitle =. Large Scale Interactive Motion Forecasting for Autonomous Driving: The

[69] [69]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Motion Transformer with Global Intention Localization and Local Movement Refinement , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

[70] [70]

Seff, Ari and Cera, Brian and Chen, Dian and Ng, Mason and Zhou, Aurick and Nayakanti, Nigamaa and Refaat, Khaled S and Al-Rfou, Rami and Sapp, Benjamin , booktitle =

[71] [71]

Philion, Jonah and Peng, Xue Bin and Fidler, Sanja , booktitle =

[72] [72]

SMART: Scalable Multi-agent Real-time Motion Generation via Next-token Prediction , volume =

Wu, Wei and Feng, Xiaoxin and Gao, Ziyan and Kan, Yuheng , booktitle =. SMART: Scalable Multi-agent Real-time Motion Generation via Next-token Prediction , volume =

[73] [73]

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models , author =. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

[74] [74]

Zhou, Zikang and Hu, Haibo and Chen, Xinhong and Wang, Jianping and Guan, Nan and Wu, Kui and Li, Yung-Hui and Huang, Yu-Kai and Xue, Chun Jason , booktitle =

[75] [75]

KiGRAS: Kinematic-Driven Generative Model for Realistic Agent Simulation , year =

Zhao, Jianbo and Zhuang, Jiaheng and Zhou, Qibin and Ban, Taiyu and Xu, Ziyao and Zhou, Hangning and Wang, Junhe and Wang, Guoan and Li, Zhiheng and Li, Bin , journal =. KiGRAS: Kinematic-Driven Generative Model for Realistic Agent Simulation , year =

[76] [76]

International Conference on Learning Representations (ICLR) , year =

High-Dimensional Continuous Control Using Generalized Advantage Estimation , author =. International Conference on Learning Representations (ICLR) , year =

[77] [77]

Versatile Behavior Diffusion for Generalized Traffic Agent Simulation , year =

Huang, Zhiyu and Zhang, Zixu and Vaidya, Ameya and Chen, Yuxiao and Fernández Fisac, Jaime and Lv, Chen , journal =. Versatile Behavior Diffusion for Generalized Traffic Agent Simulation , year =

[78] [78]

Gulino, Cole and Fu, Justin and Luo, Wenjie and Tucker, George and Bronstein, Eli and Lu, Yiren and Harb, Jean and Pan, Xinlei and Wang, Yan and Chen, Xiangyu and others , booktitle =

[79] [79]

Kazemkhani, Saman and Pandya, Aarav and Cornelisse, Daphne and Shacklett, Brennan and Vinitsky, Eugene , booktitle =

[80] [80]

SceneDiffuser: Efficient and Controllable Driving Simulation Initialization and Rollout , volume =

Jiang, Chiyu Max and Bai, Yijing and Cornman, Andre and Davis, Christopher and Huang, Xiukun and Jeon, Hong and Kulshrestha, Sakshum and Lambert, John and Li, Shuangyu and Zhou, Xuanyu and Fuertes, Carlos and Yuan, Chang and Tan, Mingxing and Zhou, Yin and Anguelov, Dragomir , booktitle =. SceneDiffuser: Efficient and Controllable Driving Simulation Initi...