pith. sign in

arxiv: 2607.02496 · v1 · pith:2GFZQLDWnew · submitted 2026-07-02 · 💻 cs.RO · cs.LG

Controllable Sim Agents with Behavior Latents

Pith reviewed 2026-07-03 10:38 UTC · model grok-4.3

classification 💻 cs.RO cs.LG
keywords controllable simulation agentsbehavior latentsvariational inferencerectified flowtraffic simulationWaymo Open Motion Datasetsoft eligibility gatesclassifier-free guidance
0
0 comments X

The pith

CNeVA infers a per-agent Gaussian behavior latent from channel-specific returns to steer simulated traffic agents along independent axes such as speed and safety.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework that makes traffic simulation agents both realistic imitators of logged data and steerable along separate behavioral dimensions. It derives a low-dimensional Gaussian latent for each agent directly from per-channel discounted returns using a closed-form variational update, then feeds this latent into a rectified-flow trajectory generator trained with masked curricula. Soft eligibility gates replace hard reward thresholds to keep gradients flowing for agents near decision boundaries. Experiments on the Waymo Open Motion Dataset show the model matches the realism of top imitation baselines while adding controllable axes for speed, acceleration, safety, and map compliance that prior models lack. The central demonstration is that these controls remain monotone and free of obvious reward-hacking artifacts when physical guardrails are applied.

Core claim

CNeVA learns to infer a per-agent Gaussian behavior latent from per-channel discounted returns via a closed-form conjugate variational update, conditioning a rectified-flow trajectory generator trained on a mixed channel-mask curriculum for classifier-free guidance. Soft eligibility gates replace hard binary thresholds with smooth exponential decay to preserve gradient signals. On the Waymo Open Motion Dataset the model reaches competitive realism while exposing per-channel controllability; speed- and acceleration-based steering yields monotone responses without stall-induced reward hacking, safety controllability is monotone and substantial, and map compliance becomes steerable under a cont

What carries the argument

The per-agent Gaussian behavior latent inferred from per-channel discounted returns via closed-form conjugate variational update, which conditions the rectified-flow trajectory generator under classifier-free guidance.

If this is right

  • The model attains competitive realism on the Waymo Open Motion Dataset benchmark.
  • Speed- and acceleration-based steering produces monotone responses without stall-induced reward hacking.
  • Safety controllability is monotone and substantial once soft eligibility gates are introduced.
  • Map compliance becomes steerable under a context-residual return measure.
  • Steering metrics must be interpreted together with physical-plausibility guardrails to avoid reward-hacking confounds.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same latent could be used to generate targeted edge-case scenarios for testing autonomous vehicle planners at scale.
  • The per-channel structure might transfer to other multi-agent domains such as pedestrian or drone simulation with minimal retraining.
  • Combining the latent with external map or weather inputs could produce controllable variations beyond the original training distribution.
  • The closed-form variational update may reduce the data volume needed to achieve controllable behavior compared with standard reinforcement-learning approaches.

Load-bearing premise

Per-channel discounted returns contain enough independent information to support a low-dimensional Gaussian latent whose dimensions can be varied independently without inducing correlations or reward-hacking artifacts in the generated trajectories.

What would settle it

A controlled experiment in which varying one latent dimension produces non-monotone trajectory changes or visible stall artifacts when steering speed or acceleration on held-out Waymo scenes.

Figures

Figures reproduced from arXiv: 2607.02496 by Juanwu Lu, Junyu Zhu, Ziran Wang.

Figure 1
Figure 1. Figure 1: Probabilistic graphical models for simulated agents. (a) The standard hidden Markov [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: CNeVA infers a per-agent behavior latent [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: CNeVA qualitative rollouts at ρ= 1, w= 1.5. Fidelity. CNeVA achieves a null-path minADE of 1.113 ± 0.011 m and an offroad rate of 32.5 ± 0.2% at 200K steps on the WOMD validation split and minADE = 1.80 m on the WOMD testing split. The off-road rate climbs over the rollout to roughly 1.3 to 1.9× the logged-data rate by 8 s (logged ≈ 17%), reflecting open-loop drift rather than a static gap. Under the ident… view at source ↗
Figure 4
Figure 4. Figure 4: Reward-hacking contrast. Left: speed CSM is inflated 6× in the early checkpoint relative to the main model. Right: physical plausibility guardrails reveal that the early ablation achieves its CSM by stalling (76% stall, 61% of GT speed), while the main model retains 95% of GT speed. (clearance > 5 m, TTC > 6 s) receive no safety label, so the generator sees a near-zero safety signal. Replacing the threshol… view at source ↗
Figure 5
Figure 5. Figure 5: Simplified factor graph corresponding to the relaxed joint equation 3: the latent state [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Empirical histogram of the per-channel discounted return [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: CSM diagonal ∆Rk for CNeVA versus the hard-eligibility ablation at ρ= 1, w= 1.5 (open￾loop, context-residual return). Left: semantic channels (safety, map). Right: kinematic channels (speed, accel). Error bars show ±1 std over 5 seeds. F.3 Multi-Seed CSM [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Map CSM diagonal (∆Rmap) across three return measures, evaluated on the 200K soft￾eligibility model at ρ= 1, w= 1.5 (open-loop). Map steering is controllable only under the context￾residual definition (+0.61); the physical-offroad (−0.12) and lane-centerline (≈ 0) measures show no measurable physical-space response. References Prafulla Dhariwal and Alex Nichol. Diffusion models beat GANs on image synthesis… view at source ↗
read the original abstract

Realistic traffic simulation requires agents that imitate logged behavior and can also be steered along interpretable axes. Such controllability enables engineers to isolate variables, reproduce specific edge cases, and test autonomous systems without real-world risk. We introduce Controllable Neural Variational Agents (CNeVA), a controllable simulated-agent framework that learns to infer a per-agent Gaussian behavior latent from per-channel discounted returns via a closed-form conjugate variational update, conditioning a rectified-flow trajectory generator trained on a mixed channel-mask curriculum for classifier-free guidance. To tackle scarcity in reward signals, we propose soft eligibility gates that replace hard binary thresholds with smooth exponential decay, preserving the gradient signal for near-threshold agents. On the Waymo Open Motion Dataset, CNeVA attains competitive realism on the benchmark while exposing per-channel controllability that the higher-ranked imitation models lack. Speed- and acceleration-based steering produces monotone responses without stall-induced reward hacking. Safety controllability is monotone and substantial with the introduction of soft eligibility. We manage to achieve steerable map compliance under a context-residual return measure. Furthermore, our experiment demonstrates that steering metrics must be read alongside physical-plausibility guardrails to avoid reward-hacking confounds.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Controllable Neural Variational Agents (CNeVA), which infers a per-agent low-dimensional Gaussian behavior latent from per-channel discounted returns via a closed-form conjugate variational update. This latent conditions a rectified-flow trajectory generator trained on a mixed channel-mask curriculum with classifier-free guidance. Soft eligibility gates replace hard thresholds with exponential decay to preserve gradients near reward thresholds. On the Waymo Open Motion Dataset the method reports competitive realism together with per-channel controllability (speed/acceleration, safety, map compliance) that higher-ranked imitation baselines lack, claiming monotone steering responses without stall-induced reward hacking.

Significance. If the reported per-channel controllability is shown to arise from genuinely independent latent dimensions, the framework would provide a practical route to interpretable, steerable traffic simulation for AV testing. The closed-form conjugate update and soft eligibility construction are efficient and gradient-friendly contributions that could be adopted more broadly.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (results): the central claim of independent per-channel controllability rests on the assumption that per-channel discounted returns supply sufficiently independent information to support a disentangled Gaussian latent. No correlation analysis, covariance matrix of the returns, or ablation isolating latent independence is reported. If returns are correlated—as is typical when speed and acceleration co-vary in driving trajectories—the conjugate variational posterior will entangle dimensions, so that classifier-free guidance on one channel affects others despite the curriculum and gates.
  2. [Abstract] Abstract: competitive realism and controllability are asserted without any quantitative tables, baseline numbers, or ablation details. This prevents assessment of whether the controllability metrics are statistically distinguishable from higher-ranked imitation models or whether the soft-eligibility improvement is load-bearing.
minor comments (2)
  1. Clarify the precise functional form of the soft eligibility decay (e.g., the decay-rate hyper-parameter) and its interaction with the rectified-flow training objective.
  2. [§4] Add explicit physical-plausibility guardrails (as mentioned in the final sentence of the abstract) to all reported steering-metric figures so that readers can verify absence of reward-hacking confounds.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the independence of the behavior latents and the need for quantitative support in the abstract. We address each point below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (results): the central claim of independent per-channel controllability rests on the assumption that per-channel discounted returns supply sufficiently independent information to support a disentangled Gaussian latent. No correlation analysis, covariance matrix of the returns, or ablation isolating latent independence is reported. If returns are correlated—as is typical when speed and acceleration co-vary in driving trajectories—the conjugate variational posterior will entangle dimensions, so that classifier-free guidance on one channel affects others despite the curriculum and gates.

    Authors: We agree that explicit verification of latent independence strengthens the central claim. The per-channel returns are processed via a closed-form conjugate variational update, and the mixed channel-mask curriculum plus classifier-free guidance are designed to encourage disentanglement, but we did not report a covariance matrix of the returns or a dedicated ablation on dimension independence. In the revision we will add (i) the empirical covariance matrix of the per-channel discounted returns on the Waymo training set and (ii) an ablation that measures cross-channel interference when steering is applied to a single latent dimension while holding others fixed. These additions will either confirm sufficient independence or quantify the residual entanglement. revision: yes

  2. Referee: [Abstract] Abstract: competitive realism and controllability are asserted without any quantitative tables, baseline numbers, or ablation details. This prevents assessment of whether the controllability metrics are statistically distinguishable from higher-ranked imitation models or whether the soft-eligibility improvement is load-bearing.

    Authors: The detailed quantitative results, including realism metrics, baseline comparisons, and ablation studies on soft eligibility, appear in §4 and the supplementary tables. To make the abstract self-contained we will revise it to include the key numerical values (e.g., realism scores relative to the top imitation baselines and the controllability deltas with/without soft eligibility) while remaining within length limits. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper's method infers Gaussian behavior latents from per-channel discounted returns via closed-form conjugate variational update, then conditions a rectified-flow trajectory generator using mixed channel-mask curriculum and classifier-free guidance, with soft eligibility gates added for reward signals. These are standard, externally validated components (variational inference, rectified flows, CFG) whose correctness does not depend on the Waymo Open Motion Dataset results or reduce the reported realism/controllability metrics to fitted inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing premises, and the evaluation uses external benchmarks without the controllability metrics being direct functions of the same fitted parameters.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The framework rests on standard variational inference assumptions and flow-matching objectives; no new physical entities are postulated. The soft eligibility gate introduces one tunable decay rate whose value is chosen to preserve gradient signal.

free parameters (1)
  • soft eligibility decay rate
    Exponential decay constant that replaces hard binary thresholds; its value affects which agents contribute gradients near reward thresholds.
axioms (2)
  • domain assumption Behavior can be summarized by a low-dimensional Gaussian latent whose dimensions align with reward channels
    Invoked when the per-channel returns are mapped to the latent via conjugate update.
  • domain assumption Rectified-flow trajectory generator trained with classifier-free guidance can be conditioned on the latent without mode collapse
    Required for the controllability claims.

pith-pipeline@v0.9.1-grok · 5735 in / 1416 out tokens · 27355 ms · 2026-07-03T10:38:02.829533+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

157 extracted references · 13 canonical work pages · 1 internal anchor

  1. [1]

    Weath- erdepth: Curriculum contrastive learning for self-supervised depth estimation under adverse weather conditions

    Yulong Cao, Boris Ivanovic, Chaowei Xiao, and Marco Pavone. Reinforcement learning with human feedback for realistic traffic simulation. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pages 14428--14434, May 2024. doi:10.1109/ICRA57147.2024.10610878

  2. [2]

    Editing driver character: Socially-controllable behavior generation for interactive traffic simulation

    Wei-Jer Chang, Chen Tang, Chenran Li, Yeping Hu, Masayoshi Tomizuka, and Wei Zhan. Editing driver character: Socially-controllable behavior generation for interactive traffic simulation. IEEE Robotics and Automation Letters, 8 0 (9): 0 5432--5439, September 2023. doi:10.1109/LRA.2023.3291897

  3. [3]

    SAFE-SIM : Safety-critical closed-loop traffic simulation with diffusion-controllable adversaries

    Wei-Jer Chang, Francesco Pittaluga, Masayoshi Tomizuka, Wei Zhan, and Manmohan Chandraker. SAFE-SIM : Safety-critical closed-loop traffic simulation with diffusion-controllable adversaries. In European Conference on Computer Vision (ECCV), 2024

  4. [4]

    SPACeR : Self-play anchoring with centralized reference models

    Wei-Jer Chang, Akshay Rangesh, Kevin Joseph, Matthew Strong, Masayoshi Tomizuka, Yihan Hu, and Wei Zhan. SPACeR : Self-play anchoring with centralized reference models. In International Conference on Learning Representations (ICLR), 2026

  5. [5]

    Human-compatible driving partners through data-regularized self-play reinforcement learning, 2024

    Daphne Cornelisse and Eugene Vinitsky. Human-compatible driving partners through data-regularized self-play reinforcement learning, 2024

  6. [6]

    Robust autonomy emerges from self-play

    Marco Cusumano-Towner, David Hafner, Alexander Hertzberg, Brody Huval, Aleksei Petrenko, Eugene Vinitsky, Erik Wijmans, Taylor Killian, Stuart Bowers, Ozan Sener, Philipp Kraehenbuehl, and Vladlen Koltun. Robust autonomy emerges from self-play. In Proceedings of the 42nd International Conference on Machine Learning, volume 267 of PMLR, pages 11710--11737, 2025

  7. [7]

    Large scale interactive motion forecasting for autonomous driving: The W aymo open motion dataset

    Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Benjamin Sapp, Charles R Qi, Yin Zhou, et al. Large scale interactive motion forecasting for autonomous driving: The W aymo open motion dataset. In IEEE/CVF International Conference on Computer Vision (ICCV), 2021

  8. [8]

    Classifier-free diffusion guidance, 2022

    Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance, 2022

  9. [9]

    Solving motion planning tasks with a scalable generative model

    Yihan Hu, Siqi Chai, Zhening Yang, Jingyu Qian, Kun Li, Wenxin Shao, Haichao Zhang, Wei Xu, and Qiang Liu. Solving motion planning tasks with a scalable generative model. In Ale s Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, and G \"u l Varol, editors, Computer Vision -- ECCV 2024, pages 386--404, Cham, 2025. Springer Nature Switzerland

  10. [10]

    Versatile behavior diffusion for generalized traffic agent simulation

    Zhiyu Huang, Zixu Zhang, Ameya Vaidya, Yuxiao Chen, Jaime Fernández Fisac, and Chen Lv. Versatile behavior diffusion for generalized traffic agent simulation. IEEE Transactions on Intelligent Transportation Systems, pages 1--17, 2026

  11. [11]

    MotionDiffuser : Controllable multi-agent motion prediction using diffusion

    Chiyu Max Jiang, Andre Cornman, Cheolho Park, Benjamin Sapp, Yin Zhou, and Dragomir Anguelov. MotionDiffuser : Controllable multi-agent motion prediction using diffusion. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

  12. [12]

    Scenediffuser: Efficient and controllable driving simulation initialization and rollout

    Chiyu Max Jiang, Yijing Bai, Andre Cornman, Christopher Davis, Xiukun Huang, Hong Jeon, Sakshum Kulshrestha, John Lambert, Shuangyu Li, Xuanyu Zhou, Carlos Fuertes, Chang Yuan, Mingxing Tan, Yin Zhou, and Dragomir Anguelov. Scenediffuser: Efficient and controllable driving simulation initialization and rollout. In Advances in Neural Information Processing...

  13. [13]

    Learning in graphical models

    Michael Irwin Jordan. Learning in graphical models. MIT press, 1999

  14. [14]

    Flow matching for generative modeling

    Yaron Lipman, Ricky T Q Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. In International Conference on Learning Representations (ICLR), 2023

  15. [15]

    Stage: Style-controllable action generation for personalized autonomous driving

    Zihao Liu, Xing Liu, Yizhai Zhang, and Panfeng Huang. Stage: Style-controllable action generation for personalized autonomous driving. IEEE Robotics and Automation Letters, 11 0 (2): 0 2130--2137, February 2026. doi:10.1109/LRA.2025.3640974

  16. [16]

    The W aymo open sim agents challenge

    Nico Montali, John Lambert, Paul Mougin, Alex Kuefler, Nicholas Rhinehart, Michelle Li, Cole Gulino, Tristan Emrich, Zoey Yang, Shimon Whiteson, et al. The W aymo open sim agents challenge. In Advances in Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2023

  17. [17]

    Wayformer: Motion forecasting via simple & efficient attention networks

    Nigamaa Nayakanti, Rami Al-Rfou, Aurick Zhou, Kratarth Goel, Khaled S Refaat, and Benjamin Sapp. Wayformer: Motion forecasting via simple & efficient attention networks. In IEEE International Conference on Robotics and Automation (ICRA), 2023

  18. [18]

    Scene transformer: A unified architecture for predicting multiple agent trajectories, 2022

    Jiquan Ngiam, Benjamin Caine, Vijay Vasudevan, Zhengdong Zhang, Hao-Tien Lewis Chiang, Jeffrey Ling, Rebecca Roelofs, Alex Bewley, Chenxi Liu, Ashish Venugopal, David Weiss, Ben Sapp, Zhifeng Chen, and Jonathon Shlens. Scene transformer: A unified architecture for predicting multiple agent trajectories, 2022

  19. [19]

    Advancing multi-agent traffic simulation via r1-style reinforcement fine-tuning, 2026

    Muleilan Pei, Shaoshuai Shi, and Shaojie Shen. Advancing multi-agent traffic simulation via r1-style reinforcement fine-tuning, 2026

  20. [20]

    Trajeglish : Traffic modeling as next-token prediction

    Jonah Philion, Xue Bin Peng, and Sanja Fidler. Trajeglish : Traffic modeling as next-token prediction. In International Conference on Learning Representations (ICLR), 2024

  21. [21]

    Scenario diffusion: Controllable driving scenario generation with diffusion

    Ethan Pronovost, Meghana Reddy Ganesina, Noureldin Hendy, Zeyu Wang, Andres Morales, Kai Wang, and Nick Roy. Scenario diffusion: Controllable driving scenario generation with diffusion. In Advances in Neural Information Processing Systems (NeurIPS), 2023

  22. [22]

    Guibas, Sanja Fidler, and Or Litany

    Davis Rempe, Jonah Philion, Leonidas J. Guibas, Sanja Fidler, and Or Litany. Generating useful accident-prone driving scenarios via a learned traffic prior. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

  23. [23]

    CtRL-Sim : Reactive and controllable driving agents with offline reinforcement learning

    Luke Rowe, Roger Girgis, Anthony Gosselin, Bruno Carrez, Florian Golemo, Felix Heide, Liam Paull, and Christopher Pal. CtRL-Sim : Reactive and controllable driving agents with offline reinforcement learning. In Conference on Robot Learning (CoRL), 2024

  24. [24]

    Scenario dreamer: Vectorized latent diffusion for generating driving simulation environments

    Luke Rowe, Roger Girgis, Anthony Gosselin, Liam Paull, Christopher Pal, and Felix Heide. Scenario dreamer: Vectorized latent diffusion for generating driving simulation environments. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

  25. [25]

    MotionLM : Multi-agent motion forecasting as language modeling

    Ari Seff, Brian Cera, Dian Chen, Mason Ng, Aurick Zhou, Nigamaa Nayakanti, Khaled S Refaat, Rami Al-Rfou, and Benjamin Sapp. MotionLM : Multi-agent motion forecasting as language modeling. In IEEE/CVF International Conference on Computer Vision (ICCV), 2023

  26. [26]

    Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying

    Shaoshuai Shi, Li Jiang, Dengxin Dai, and Bernt Schiele. Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46 0 (5): 0 3955--3971, May 2024. doi:10.1109/TPAMI.2024.3352811

  27. [27]

    TrafficSim : Learning to simulate realistic multi-agent behaviors

    Simon Suo, Sebastian Regalado, Sergio Casas, and Raquel Urtasun. TrafficSim : Learning to simulate realistic multi-agent behaviors. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

  28. [28]

    SceneDiffuser++ : City-scale traffic simulation via a generative world model

    Shuhan Tan, John Lambert, Hong Jeon, Sakshum Kulshrestha, Yijing Bai, Jing Luo, Dragomir Anguelov, Mingxing Tan, and Chiyu Max Jiang. SceneDiffuser++ : City-scale traffic simulation via a generative world model. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025 a

  29. [29]

    Flow matching-based autonomous driving planning with advanced interactive behavior modeling

    Tianyi Tan, Yinan Zheng, Ruiming Liang, Zexu Wang, Kexin ZHENG, Jinliang Zheng, Jianxiong Li, Xianyuan Zhan, and Jingjing Liu. Flow matching-based autonomous driving planning with advanced interactive behavior modeling. In Advances in Neural Information Processing Systems (NeurIPS), 2025 b

  30. [30]

    Smart: Scalable multi-agent real-time motion generation via next-token prediction

    Wei Wu, Xiaoxin Feng, Ziyan Gao, and Yuheng Kan. Smart: Scalable multi-agent real-time motion generation via next-token prediction. In Advances in Neural Information Processing Systems, volume 37, pages 114048--114071, 2024

  31. [31]

    GoalFlow : Goal-driven flow matching for multimodal trajectories generation in end-to-end autonomous driving

    Zebin Xing, Xingyu Zhang, Yang Hu, Bo Jiang, Tong He, Qian Zhang, Xiaoxiao Long, and Wei Yin. GoalFlow : Goal-driven flow matching for multimodal trajectories generation in end-to-end autonomous driving. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

  32. [32]

    Diverse critical interaction generation for planning and planner evaluation

    Zhao-Heng Yin, Lingfeng Sun, Liting Sun, Masayoshi Tomizuka, and Wei Zhan. Diverse critical interaction generation for planning and planner evaluation. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 7036--7043, September 2021. doi:10.1109/IROS51168.2021.9636266

  33. [33]

    Trajgen: Generating realistic and diverse trajectories with reactive and feasible agent behaviors for autonomous driving

    Qichao Zhang, Yinfeng Gao, Yikang Zhang, Youtian Guo, Dawei Ding, Yunpeng Wang, Peng Sun, and Dongbin Zhao. Trajgen: Generating realistic and diverse trajectories with reactive and feasible agent behaviors for autonomous driving. IEEE Transactions on Intelligent Transportation Systems, 23 0 (12): 0 24474--24487, December 2022. ISSN 1558-0016. doi:10.1109/...

  34. [34]

    TrafficBots : Towards world models for autonomous driving simulation and motion prediction

    Zhejun Zhang, Alexander Liniger, Dengxin Dai, Fisher Yu, and Luc Van Gool. TrafficBots : Towards world models for autonomous driving simulation and motion prediction. In IEEE International Conference on Robotics and Automation (ICRA), 2023

  35. [35]

    Closed-loop supervised fine-tuning of tokenized traffic models

    Zhejun Zhang, Peter Karkus, Maximilian Igl, Wenhao Ding, Yuxiao Chen, Boris Ivanovic, and Marco Pavone. Closed-loop supervised fine-tuning of tokenized traffic models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025 a

  36. [36]

    TrajTok : Technical report for 2025 Waymo open sim agents challenge

    Zhiyuan Zhang, Xiaosong Jia, Guanyu Chen, Qifeng Li, and Junchi Yan. TrajTok : Technical report for 2025 Waymo open sim agents challenge. Technical report, Shanghai Jiao Tong University, 2025 b

  37. [37]

    Language-guided traffic simulation via scene-level diffusion

    Ziyuan Zhong, Davis Rempe, Yuxiao Chen, Boris Ivanovic, Yulong Cao, Danfei Xu, Marco Pavone, and Baishakhi Ray. Language-guided traffic simulation via scene-level diffusion. In Conference on Robot Learning (CoRL), 2023 a

  38. [38]

    Guided conditional diffusion for controllable traffic simulation

    Ziyuan Zhong, Davis Rempe, Danfei Xu, Yuxiao Chen, Sushant Veer, Tong Che, Baishakhi Ray, and Marco Pavone. Guided conditional diffusion for controllable traffic simulation. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 3560--3566, 2023 b

  39. [39]

    BehaviorGPT : Smart agent simulation for autonomous driving with next-patch prediction

    Zikang Zhou, Haibo Hu, Xinhong Chen, Jianping Wang, Nan Guan, Kui Wu, Yung-Hui Li, Yu-Kai Huang, and Chun Jason Xue. BehaviorGPT : Smart agent simulation for autonomous driving with next-patch prediction. In Advances in Neural Information Processing Systems (NeurIPS), 2024

  40. [40]

    Chang, Wei-Jer and Rangesh, Akshay and Joseph, Kevin and Strong, Matthew and Tomizuka, Masayoshi and Hu, Yihan and Zhan, Wei , booktitle =

  41. [41]

    HIQL: Offline Goal-Conditioned RL with Latent States as Actions , volume =

    Park, Seohong and Ghosh, Dibya and Eysenbach, Benjamin and Levine, Sergey , booktitle =. HIQL: Offline Goal-Conditioned RL with Latent States as Actions , volume =

  42. [42]

    Offline Reinforcement Learning with Implicit

    Kostrikov, Ilya and Nair, Ashvin and Levine, Sergey , booktitle =. Offline Reinforcement Learning with Implicit

  43. [43]

    Conservative

    Kumar, Aviral and Zhou, Aurick and Tucker, George and Levine, Sergey , booktitle =. Conservative

  44. [44]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Decision Transformer: Reinforcement Learning via Sequence Modeling , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  45. [45]

    Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

    Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , author =. arXiv preprint arXiv:1910.00177 , year =

  46. [46]

    Nair, Ashvin and Gupta, Abhishek and Dalal, Murtaza and Levine, Sergey , journal =

  47. [47]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    A Minimalist Approach to Offline Reinforcement Learning , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  48. [48]

    International Conference on Learning Representations (ICLR) , year =

    Learning to Reach Goals via Iterated Supervised Learning , author =. International Conference on Learning Representations (ICLR) , year =

  49. [49]

    Emmons, Scott and Eysenbach, Benjamin and Kostrikov, Ilya and Levine, Sergey , booktitle =

  50. [50]

    Rethinking Goal-Conditioned Supervised Learning and its Connection to Offline

    Yang, Rui and Lu, Yiming and Li, Wenhao and Sun, Hao and Fang, Meng and Du, Yali and Li, Xiu and Han, Lei and Zhang, Chongjie , booktitle =. Rethinking Goal-Conditioned Supervised Learning and its Connection to Offline

  51. [51]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Contrastive Learning as Goal-Conditioned Reinforcement Learning , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  52. [52]

    International Conference on Machine Learning (ICML) , year =

    Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning , author =. International Conference on Machine Learning (ICML) , year =

  53. [53]

    Offline Goal-Conditioned Reinforcement Learning via

    Ma, Yecheng Jason and Yan, Jason and Jayaraman, Dinesh and Bastani, Osbert , booktitle =. Offline Goal-Conditioned Reinforcement Learning via

  54. [54]

    Park, Seohong and Frans, Kevin and Eysenbach, Benjamin and Levine, Sergey , booktitle =

  55. [55]

    Foundation Policies with

    Park, Seohong and Kreiman, Tobias and Levine, Sergey , booktitle =. Foundation Policies with

  56. [56]

    Rowe, Luke and Girgis, Roger and Gosselin, Anthony and Carrez, Bruno and Golemo, Florian and Heide, Felix and Paull, Liam and Pal, Christopher , booktitle =

  57. [57]

    Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments , booktitle =

  58. [58]

    Ajay, Anurag and Kumar, Aviral and Agrawal, Pulkit and Levine, Sergey and Nachum, Ofir , booktitle =

  59. [59]

    Vezhnevets, Alexander Sasha and Osindero, Simon and Schaul, Tom and Heess, Nicolas and Jaderberg, Max and Silver, David and Kavukcuoglu, Koray , booktitle =

  60. [60]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Data-Efficient Hierarchical Reinforcement Learning , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  61. [61]

    International Conference on Learning Representations (ICLR) , year =

    Learning Multi-Level Hierarchies with Hindsight , author =. International Conference on Learning Representations (ICLR) , year =

  62. [62]

    AAAI Conference on Artificial Intelligence (AAAI) , year =

    The Option-Critic Architecture , author =. AAAI Conference on Artificial Intelligence (AAAI) , year =

  63. [63]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Deep Hierarchical Planning from Pixels , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  64. [64]

    Conference on Robot Learning (CoRL) , year =

    Relay Policy Learning: Solving Long Horizon Tasks via Imitation and Reinforcement Learning , author =. Conference on Robot Learning (CoRL) , year =

  65. [65]

    arXiv preprint arXiv:2506.18847 , year =

    Projective Quasimetric Planning , author =. arXiv preprint arXiv:2506.18847 , year =

  66. [66]

    arXiv preprint arXiv:2505.14975 , year =

    Flattening Hierarchies with Policy Bootstrapping , author =. arXiv preprint arXiv:2505.14975 , year =

  67. [67]

    Montali, Nico and Lambert, John and Mougin, Paul and Kuefler, Alex and Rhinehart, Nicholas and Li, Michelle and Gulino, Cole and Emrich, Tristan and Yang, Zoey and Whiteson, Shimon and others , booktitle =. The

  68. [68]

    Large Scale Interactive Motion Forecasting for Autonomous Driving: The

    Ettinger, Scott and Cheng, Shuyang and Caine, Benjamin and Liu, Chenxi and Zhao, Hang and Pradhan, Sabeek and Chai, Yuning and Sapp, Benjamin and Qi, Charles R and Zhou, Yin and others , booktitle =. Large Scale Interactive Motion Forecasting for Autonomous Driving: The

  69. [69]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Motion Transformer with Global Intention Localization and Local Movement Refinement , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  70. [70]

    Seff, Ari and Cera, Brian and Chen, Dian and Ng, Mason and Zhou, Aurick and Nayakanti, Nigamaa and Refaat, Khaled S and Al-Rfou, Rami and Sapp, Benjamin , booktitle =

  71. [71]

    Philion, Jonah and Peng, Xue Bin and Fidler, Sanja , booktitle =

  72. [72]

    SMART: Scalable Multi-agent Real-time Motion Generation via Next-token Prediction , volume =

    Wu, Wei and Feng, Xiaoxin and Gao, Ziyan and Kan, Yuheng , booktitle =. SMART: Scalable Multi-agent Real-time Motion Generation via Next-token Prediction , volume =

  73. [73]

    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

    Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models , author =. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

  74. [74]

    Zhou, Zikang and Hu, Haibo and Chen, Xinhong and Wang, Jianping and Guan, Nan and Wu, Kui and Li, Yung-Hui and Huang, Yu-Kai and Xue, Chun Jason , booktitle =

  75. [75]

    KiGRAS: Kinematic-Driven Generative Model for Realistic Agent Simulation , year =

    Zhao, Jianbo and Zhuang, Jiaheng and Zhou, Qibin and Ban, Taiyu and Xu, Ziyao and Zhou, Hangning and Wang, Junhe and Wang, Guoan and Li, Zhiheng and Li, Bin , journal =. KiGRAS: Kinematic-Driven Generative Model for Realistic Agent Simulation , year =

  76. [76]

    International Conference on Learning Representations (ICLR) , year =

    High-Dimensional Continuous Control Using Generalized Advantage Estimation , author =. International Conference on Learning Representations (ICLR) , year =

  77. [77]

    Versatile Behavior Diffusion for Generalized Traffic Agent Simulation , year =

    Huang, Zhiyu and Zhang, Zixu and Vaidya, Ameya and Chen, Yuxiao and Fernández Fisac, Jaime and Lv, Chen , journal =. Versatile Behavior Diffusion for Generalized Traffic Agent Simulation , year =

  78. [78]

    Gulino, Cole and Fu, Justin and Luo, Wenjie and Tucker, George and Bronstein, Eli and Lu, Yiren and Harb, Jean and Pan, Xinlei and Wang, Yan and Chen, Xiangyu and others , booktitle =

  79. [79]

    Kazemkhani, Saman and Pandya, Aarav and Cornelisse, Daphne and Shacklett, Brennan and Vinitsky, Eugene , booktitle =

  80. [80]

    SceneDiffuser: Efficient and Controllable Driving Simulation Initialization and Rollout , volume =

    Jiang, Chiyu Max and Bai, Yijing and Cornman, Andre and Davis, Christopher and Huang, Xiukun and Jeon, Hong and Kulshrestha, Sakshum and Lambert, John and Li, Shuangyu and Zhou, Xuanyu and Fuertes, Carlos and Yuan, Chang and Tan, Mingxing and Zhou, Yin and Anguelov, Dragomir , booktitle =. SceneDiffuser: Efficient and Controllable Driving Simulation Initi...

Showing first 80 references.