Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation

Christoph Stiller; Fabian Konstantinidis; Moritz Sackmann; Ulrich Hofmann

arxiv: 2512.05812 · v5 · submitted 2025-12-05 · 💻 cs.RO · cs.CV

Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation

Fabian Konstantinidis , Moritz Sackmann , Ulrich Hofmann , Christoph Stiller This is my paper

Pith reviewed 2026-05-17 00:53 UTC · model grok-4.3

classification 💻 cs.RO cs.CV

keywords multi-agent driving simulationinstance-centric representationbehavior modelingadversarial inverse reinforcement learningrelative positional encodingstraffic simulationrobust trajectory prediction

0 comments

The pith

Instance-centric local frames with relative encodings let behavior models for multi-agent driving simulation scale efficiently while improving accuracy and robustness over agent-centric baselines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a behavior model to control individual vehicles in multi-agent driving simulations that must remain both realistic and fast as the number of agents grows. Each traffic participant and map element is placed in its own local coordinate frame so that static map information can be encoded once and reused at every time step. A query-centric symmetric encoder uses relative positional encodings between these local frames to capture interactions without needing a single global viewpoint. Training relies on adversarial inverse reinforcement learning together with an adaptive reward transformation that automatically trades off realism against robustness. The resulting model reduces training and inference time as token count rises and produces more accurate and stable trajectory predictions than several agent-centric alternatives.

Core claim

By placing every traffic participant and map element inside its own local coordinate frame, the method obtains a viewpoint-invariant scene encoding that reuses static map tokens across simulation steps. Interactions are modeled through a query-centric symmetric context encoder that applies relative positional encodings between the local frames. Adversarial inverse reinforcement learning combined with an adaptive reward transformation learns the policy, yielding a behavior model whose training and inference cost grows more slowly with the number of tokens and whose positional accuracy and robustness exceed those of agent-centric baselines in multi-agent driving simulation.

What carries the argument

Instance-centric scene representation that encodes each agent and map element in its own local coordinate frame, paired with relative positional encodings inside a query-centric symmetric context encoder.

Load-bearing premise

The local frames and relative encodings between them are assumed to preserve every interaction detail that matters without losing context that would only be visible from a shared global viewpoint.

What would settle it

Run the learned policy on a set of held-out scenarios containing agents whose relative positions create strong viewpoint asymmetry or partial occlusions; if the instance-centric model then shows higher average displacement error than a matched agent-centric baseline, the sufficiency claim is falsified.

Figures

Figures reproduced from arXiv: 2512.05812 by Christoph Stiller, Fabian Konstantinidis, Moritz Sackmann, Ulrich Hofmann.

**Figure 2.** Figure 2: Single simulation step: The behavior model maps observations to [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration of an example situation using instance-centric observa [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Illustration of the proposed instance-centric behavior model mapping observations to actions. Instance encoders convert observations into latent [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Regressed inference latency of a single policy-network forward [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Peak throughput of the behavior model. a) Inference Steps per Second [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

read the original abstract

Scalable multi-agent driving simulation requires behavior models that are both realistic and computationally efficient. We address this by optimizing the behavior model that controls individual traffic participants. To improve efficiency, we adopt an instance-centric scene representation, where each traffic participant and map element is modeled in its own local coordinate frame. This design enables efficient, viewpoint-invariant scene encoding and allows static map tokens to be reused across simulation steps. To model interactions, we employ a query-centric symmetric context encoder with relative positional encodings between local frames. We use Adversarial Inverse Reinforcement Learning to learn the behavior model and propose an adaptive reward transformation that automatically balances robustness and realism during training. Experiments demonstrate that our approach scales efficiently with the number of tokens, significantly reducing training and inference times, while outperforming several agent-centric baselines in terms of positional accuracy and robustness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Instance-centric local frames and relative encodings offer a practical efficiency boost for multi-agent driving sims, but the abstract's outperformance claims need concrete numbers to evaluate.

read the letter

I read the Konstantinidis et al. paper on behavior models for multi-agent driving simulation. The core idea is switching to an instance-centric representation so each agent and map element sits in its own local frame. This lets them reuse static map tokens across steps and pair it with a query-centric symmetric encoder that relies on relative positional encodings between frames. They then train via Adversarial Inverse Reinforcement Learning with an adaptive reward transformation that adjusts the balance between robustness and realism on the fly. Experiments are said to show better scaling with token count plus gains in positional accuracy and robustness over agent-centric baselines. The efficiency angle is the part that lands. Handling larger numbers of agents without quadratic blowup in compute is a genuine pain point in driving simulators, and the local-frame reuse plus relative encodings is a direct way to address it. The adaptive reward piece also looks like a useful practical addition to avoid hand-tuning during AIRL training. The abstract, however, gives no numbers, no error bars, no baseline details, and no mention of how they measured robustness. That leaves the central claim of outperformance hard to judge from what's visible. The stress-test point about possible loss of long-range or absolute context is worth watching in the full results; if relative encodings alone recover everything needed at intersections or merges, the paper should show it explicitly rather than assume it. This is aimed at researchers who build or use multi-agent simulators for autonomous-vehicle testing and want to push scale without extra hardware. Readers already working on scene representations or IRL for robotics might pick up the specific combination for their own setups. The work has enough of a concrete engineering contribution to go through peer review so the experiments can be checked properly.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an instance-centric scene representation for multi-agent driving simulation, in which each traffic participant and map element is encoded in its own local coordinate frame. This enables viewpoint-invariant encoding and reuse of static map tokens across timesteps. Interactions are modeled with a query-centric symmetric context encoder that employs relative positional encodings between local frames. Behavior is learned via Adversarial Inverse Reinforcement Learning (AIRL) together with a proposed adaptive reward transformation that balances robustness and realism. The central experimental claim is that the approach scales efficiently with token count, reduces training and inference time, and outperforms several agent-centric baselines on positional accuracy and robustness.

Significance. If the representation and learning claims are substantiated with quantitative evidence, the work could offer a practical route to scalable, realistic multi-agent simulation for autonomous-driving validation. The combination of local-frame efficiency, token reuse, and AIRL-based policy learning addresses both computational and behavioral realism bottlenecks that currently limit large-scale simulation.

major comments (2)

[§3.2] §3.2 (Instance-centric representation and relative encodings): The claim that relative positional encodings between local frames are sufficient to capture all relevant multi-agent interactions rests on an untested assumption. No ablation or analysis is provided showing that long-range relations (e.g., distant vehicles at an intersection or merging lane) are recovered without introducing viewpoint artifacts or loss of absolute context. This directly affects the robustness results reported in §4.
[§4] §4 (Experiments): The abstract and results section assert clear outperformance in positional accuracy and robustness together with large efficiency gains, yet no quantitative metrics (ADE/FDE, collision rates, timing numbers), error bars, baseline implementation details, or data-exclusion criteria are supplied. Without these, the central empirical claim cannot be evaluated or reproduced.

minor comments (2)

[Figure 1] Figure 1 or §3.1: A diagram explicitly showing the local-frame transformations and how relative encodings are computed between agents would improve clarity of the instance-centric design.
[§3] Notation in §3: The symbols used for local frames, query tokens, and the adaptive reward transformation should be defined in a single table or paragraph to avoid scattered definitions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below and describe the revisions we will make to improve clarity, substantiation, and reproducibility.

read point-by-point responses

Referee: [§3.2] §3.2 (Instance-centric representation and relative encodings): The claim that relative positional encodings between local frames are sufficient to capture all relevant multi-agent interactions rests on an untested assumption. No ablation or analysis is provided showing that long-range relations (e.g., distant vehicles at an intersection or merging lane) are recovered without introducing viewpoint artifacts or loss of absolute context. This directly affects the robustness results reported in §4.

Authors: We appreciate the referee's point that an explicit ablation would provide stronger support. The query-centric symmetric context encoder with relative positional encodings is specifically designed to model interactions through relative geometry, which is viewpoint-invariant and avoids the need for absolute coordinates. Our robustness experiments in §4 already include complex multi-agent scenarios such as intersections and merges where long-range relations are present, and the performance gains over agent-centric baselines indicate that these relations are captured effectively. Nevertheless, we agree that a dedicated analysis would address the concern directly. In the revision we will add an ablation study in §4 that compares variants with and without relative positional encodings on subsets of scenes containing distant agents, reporting effects on both accuracy and robustness metrics. revision: yes
Referee: [§4] §4 (Experiments): The abstract and results section assert clear outperformance in positional accuracy and robustness together with large efficiency gains, yet no quantitative metrics (ADE/FDE, collision rates, timing numbers), error bars, baseline implementation details, or data-exclusion criteria are supplied. Without these, the central empirical claim cannot be evaluated or reproduced.

Authors: We regret that the quantitative details were not presented with sufficient prominence. Section 4 contains ADE/FDE values for positional accuracy, collision rates for robustness evaluation, and wall-clock timing measurements for training and inference efficiency, all compared against the listed agent-centric baselines. Error bars reflect standard deviation across three random seeds, baseline implementations follow the original papers with hyperparameters listed in the appendix, and data-exclusion criteria follow the standard train/validation splits of the nuScenes dataset with no additional filtering beyond scene length. To resolve the referee's concern we will (i) insert the key numerical results into the abstract, (ii) expand §4 with a consolidated results table that includes all metrics and error bars, and (iii) add a short paragraph detailing baseline re-implementation choices and data criteria. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper presents an instance-centric scene representation with relative positional encodings and an AIRL-based learning procedure with adaptive reward transformation. Claims of efficiency scaling and outperformance rest on experimental comparisons to external agent-centric baselines rather than any self-definitional reduction, fitted parameter renamed as prediction, or load-bearing self-citation chain. No equations or steps in the provided text equate outputs to inputs by construction; results are presented as empirically validated on benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are identifiable from the provided text.

pith-pipeline@v0.9.0 · 5437 in / 1033 out tokens · 65780 ms · 2026-05-17T00:53:40.776913+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we employ instance-centric observations, representing each instance... in its respective local coordinate frame... relative positional encoding between a target agent i and any of the combined instance tokens... ri→j = [Δαi→j, ψi→j, ∥pi→j∥]⊺

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 4 internal anchors

[1]

In [6], AIRL is used, where a discriminator is trained to distinguish real from simulated behavior, assigning higher scores to more realistic samples

address this issue by reconstructing a reward signal from real-world data. In [6], AIRL is used, where a discriminator is trained to distinguish real from simulated behavior, assigning higher scores to more realistic samples. As the goal is to drive as realistically as possible, the output of the discriminator is then used as a reward signal for RL traini...

work page
[2]

In [18], a global map is rasterized and encoded using a CNN

is the only work on learning a scene-centric multi-agent behavior model for closed-loop simulation. In [18], a global map is rasterized and encoded using a CNN. Then, local map features are extracted via Rotated Region of Interest Align and fused with the agent features. Lastly, a joint decoder model, realized as a message passing network, processes all a...

work page
[3]

to reconstruct a surrogate reward signal from real data, given as D={(o 1, a1),(o 2, a2), . . .}. In AIRL, an additional discriminator model Dϕ is trained to distinguish generated from real samples, outputting the probability Dϕ(o, a)∈[0,1] for the observation-action pair being real, i. e., stemming from D. The policy is trained via RL using the surrogate...

work page
[4]

Constant Velocity (CV): A learning-free baseline where agents are assumed to continue moving forward at a constant velocity

work page
[5]

LateFusionMLP[8]: Following [7], [8], [16], this compact agent-centric model consists solely of MLPs and max-pooling operations. We adopt the public implementation [8], replacing its discrete action decoder with ours to support continuous actions and training it within our framework for realistic behavior modeling

work page
[6]

GraphAIRL[6]: A more sophisticated agent-centric model that leverages a vectorized scene representation

work page
[7]

We evaluate two variants: 1) trained with c= 5 , as proposed in [6], and 2) trained with our proposed adaptive reward offset, defined in (3)

and attention-based interaction modeling. We evaluate two variants: 1) trained with c= 5 , as proposed in [6], and 2) trained with our proposed adaptive reward offset, defined in (3)

work page
[8]

Our agent-centric observations include both nearby agents and map elements within the observation radius

Behavior Cloning (BC): A supervised learning variant of our instance-centric approach, trained for 600 epochs by minimizing the negative log-likelihood of expert actions under the predicted action distribution. Our agent-centric observations include both nearby agents and map elements within the observation radius. The start and end points of a vector v a...

work page
[9]

Mixsim: A hierarchical framework for mixed reality traffic simulation,

S. Suo, K. Wong, J. Xu, J. Tu, A. Cui, S. Casas, and R. Urtasun, “Mixsim: A hierarchical framework for mixed reality traffic simulation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 9622–9631

work page 2023
[10]

Sledge: Synthesizing driving environments with generative models and rule-based traffic,

K. Chitta, D. Dauner, and A. Geiger, “Sledge: Synthesizing driving environments with generative models and rule-based traffic,” in European Conference on Computer Vision. Springer, 2024, pp. 57–74

work page 2024
[11]

Learning robust control policies for end-to- end autonomous driving from data-driven simulation,

A. Amini, I. Gilitschenski, J. Phillips, J. Moseyko, R. Banerjee, S. Karaman, and D. Rus, “Learning robust control policies for end-to- end autonomous driving from data-driven simulation,”IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 1143–1150, 2020

work page 2020
[12]

Trafficbots: Towards world models for autonomous driving simulation and motion prediction,

Z. Zhang, A. Liniger, D. Dai, F. Yu, and L. Van Gool, “Trafficbots: Towards world models for autonomous driving simulation and motion prediction,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 1522–1529

work page 2023
[13]

Modeling human driving behavior through generative adversarial imitation learning,

R. Bhattacharyya, B. Wulfe, D. J. Phillips, A. Kuefler, J. Morton, R. Senanayake, and M. J. Kochenderfer, “Modeling human driving behavior through generative adversarial imitation learning,”IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 3, pp. 2874–2887, 2022

work page 2022
[14]

Graph- based adversarial imitation learning for predicting human driving behavior,

F. Konstantinidis, M. Sackmann, U. Hofmann, and C. Stiller, “Graph- based adversarial imitation learning for predicting human driving behavior,” in2024 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2024, pp. 857–864

work page 2024
[15]

Robust autonomy emerges from self-play

M. Cusumano-Towner, D. Hafner, A. Hertzberg, B. Huval, A. Petrenko, E. Vinitsky, E. Wijmans, T. Killian, S. Bowers, O. Seneret al., “Robust autonomy emerges from self-play,”arXiv preprint arXiv:2502.03349, 2025

work page arXiv 2025
[16]

Building reliable sim driving agents by scaling self-play,

D. Cornelisse, A. Pandya, K. Joseph, J. Su ´arez, and E. Vinitsky, “Building reliable sim driving agents by scaling self-play,”arXiv preprint arXiv:2502.14706, 2025

work page arXiv 2025
[17]

ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst

M. Bansal, A. Krizhevsky, and A. Ogale, “Chauffeurnet: Learning to drive by imitating the best and synthesizing the worst,”arXiv preprint arXiv:1812.03079, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[18]

Imitation is not enough: Robustifying imitation with reinforcement learning for challenging driving scenarios,

Y . Lu, J. Fu, G. Tucker, X. Pan, E. Bronstein, R. Roelofs, B. Sapp, B. White, A. Faust, S. Whitesonet al., “Imitation is not enough: Robustifying imitation with reinforcement learning for challenging driving scenarios,” in2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 7553–7560

work page 2023
[19]

Model-free deep reinforcement learning for urban autonomous driving,

J. Chen, B. Yuan, and M. Tomizuka, “Model-free deep reinforcement learning for urban autonomous driving,” in2019 IEEE intelligent transportation systems conference (ITSC). IEEE, 2019, pp. 2765– 2771

work page 2019
[20]

Modeling interaction-aware driving behavior using graph-based representations and multi-agent reinforcement learning,

F. Konstantinidis, M. Sackmann, U. Hofmann, and C. Stiller, “Modeling interaction-aware driving behavior using graph-based representations and multi-agent reinforcement learning,” in2023 IEEE 26th Interna- tional Conference on Intelligent Transportation Systems (ITSC). IEEE, 2023, pp. 1643–1650

work page 2023
[21]

Importance sampling-guided meta-training for intelligent agents in highly interactive environments,

M. Arief, M. Timmerman, J. Li, D. Isele, and M. J. Kochenderfer, “Importance sampling-guided meta-training for intelligent agents in highly interactive environments,”IEEE Robotics and Automation Letters, 2024

work page 2024
[22]

Learning robust rewards with adverse- rial inverse reinforcement learning,

J. Fu, K. Luo, and S. Levine, “Learning robust rewards with adverse- rial inverse reinforcement learning,” inInternational Conference on Learning Representations, 2018

work page 2018
[23]

Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research,

C. Gulino, J. Fu, W. Luo, G. Tucker, E. Bronstein, Y . Lu, J. Harb, X. Pan, Y . Wang, X. Chenet al., “Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research,”Advances in Neural Information Processing Systems, vol. 36, pp. 7730–7742, 2023

work page 2023
[24]

Gpudrive: Data-driven, multi-agent driving simulation at 1 million fps,

S. Kazemkhani, A. Pandya, D. Cornelisse, B. Shacklett, and E. Vinitsky, “Gpudrive: Data-driven, multi-agent driving simulation at 1 million fps,” arXiv preprint arXiv:2408.01584, 2024

work page arXiv 2024
[25]

Imagining the road ahead: Multi-agent trajectory prediction via differentiable simulation,

A. ´Scibior, V . Lioutas, D. Reda, P. Bateni, and F. Wood, “Imagining the road ahead: Multi-agent trajectory prediction via differentiable simulation,” in2021 IEEE International Intelligent Transportation Systems Conference (ITSC), 2021, pp. 720–725

work page 2021
[26]

Trafficsim: Learning to simulate realistic multi-agent behaviors,

S. Suo, S. Regalado, S. Casas, and R. Urtasun, “Trafficsim: Learning to simulate realistic multi-agent behaviors,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10 400–10 409

work page 2021
[27]

Scene transformer: A unified architecture for predicting multiple agent trajectories.arXiv preprint arXiv:2106.08417, 2021

J. Ngiam, B. Caine, V . Vasudevan, Z. Zhang, H.-T. L. Chiang, J. Ling, R. Roelofs, A. Bewley, C. Liu, A. Venugopalet al., “Scene transformer: A unified architecture for predicting multiple agent trajectories,”arXiv preprint arXiv:2106.08417, 2021

work page arXiv 2021
[28]

Simnet: Learning reactive self-driving simulations from real-world observations,

L. Bergamini, Y . Ye, O. Scheel, L. Chen, C. Hu, L. Del Pero, B. Osi´nski, H. Grimmett, and P. Ondruska, “Simnet: Learning reactive self-driving simulations from real-world observations,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 5119–5125

work page 2021
[29]

General lane-changing model mobil for car-following models,

A. Kesting, M. Treiber, and D. Helbing, “General lane-changing model mobil for car-following models,”Transportation Research Record, vol. 1999, no. 1, pp. 86–94, 2007

work page 1999
[30]

Enhanced intelligent driver model to access the impact of driving strategies on traffic capacity,

——, “Enhanced intelligent driver model to access the impact of driving strategies on traffic capacity,”Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 368, no. 1928, pp. 4585–4605, 2010

work page 1928
[31]

Feedback in imitation learning: The three regimes of covariate shift,

J. Spencer, S. Choudhury, A. Venkatraman, B. Ziebart, and J. A. Bagnell, “Feedback in imitation learning: The three regimes of covariate shift,” arXiv preprint arXiv:2102.02872, 2021

work page arXiv 2021
[32]

Modelling two-dimensional driving behaviours at unsignalised intersection using multi-agent imitation learning,

J. Sun and J. Kim, “Modelling two-dimensional driving behaviours at unsignalised intersection using multi-agent imitation learning,” Transportation Research Part C: Emerging Technologies, vol. 165, p. 104702, 2024

work page 2024
[33]

Betail: Behavior transformer adversarial imitation learning from human racing gameplay,

C. Weaver, C. Tang, C. Hao, K. Kawamoto, M. Tomizuka, and W. Zhan, “Betail: Behavior transformer adversarial imitation learning from human racing gameplay,”IEEE Robotics and Automation Letters, 2024

work page 2024
[34]

Narrowing the coordinate-frame gap in behavior prediction models: Distillation for efficient and accurate scene-centric motion forecasting,

D. A. Su, B. Douillard, R. Al-Rfou, C. Park, and B. Sapp, “Narrowing the coordinate-frame gap in behavior prediction models: Distillation for efficient and accurate scene-centric motion forecasting,” in2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 653–659

work page 2022
[35]

Simpl: A simple and efficient multi-agent motion prediction baseline for autonomous driving,

L. Zhang, P. Li, S. Liu, and S. Shen, “Simpl: A simple and efficient multi-agent motion prediction baseline for autonomous driving,”IEEE Robotics and Automation Letters (RA-L), 2024

work page 2024
[36]

Query-centric trajectory prediction,

Z. Zhou, J. Wang, Y .-H. Li, and Y .-K. Huang, “Query-centric trajectory prediction,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 17 863–17 873

work page 2023
[37]

Real-time motion prediction via heterogeneous polyline transformer with relative pose encoding,

Z. Zhang, A. Liniger, C. Sakaridis, F. Yu, and L. V . Gool, “Real-time motion prediction via heterogeneous polyline transformer with relative pose encoding,”Advances in Neural Information Processing Systems, vol. 36, pp. 57 481–57 499, 2023

work page 2023
[38]

Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying,

S. Shi, L. Jiang, D. Dai, and B. Schiele, “Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 5, pp. 3955–3971, 2024

work page 2024
[39]

High-Dimensional Continuous Control Using Generalized Advantage Estimation

J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High- dimensional continuous control using generalized advantage estimation,” arXiv preprint arXiv:1506.02438, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[40]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox- imal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[41]

Vectornet: Encoding hd maps and agent dynamics from vectorized representation,

J. Gao, C. Sun, H. Zhao, Y . Shen, D. Anguelov, C. Li, and C. Schmid, “Vectornet: Encoding hd maps and agent dynamics from vectorized representation,” in2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11 522–11 530

work page 2020
[42]

Film: Visual reasoning with a general conditioning layer,

E. Perez, F. Strub, H. De Vries, V . Dumoulin, and A. Courville, “Film: Visual reasoning with a general conditioning layer,” inProceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018

work page 2018
[43]

Driving with llms: Fusing object- level vector modality for explainable autonomous driving,

L. Chen, O. Sinavski, J. H ¨unermann, A. Karnsund, A. J. Willmott, D. Birch, D. Maund, and J. Shotton, “Driving with llms: Fusing object- level vector modality for explainable autonomous driving,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024

work page 2024
[44]

Perceiver: General perception with iterative attention,

A. Jaegle, F. Gimeno, A. Brock, O. Vinyals, A. Zisserman, and J. Carreira, “Perceiver: General perception with iterative attention,” inInternational conference on machine learning. PMLR, 2021, pp. 4651–4664

work page 2021
[45]

Interaction dataset: An international, adversarial and cooperative motion dataset in interactive driving scenarios with semantic maps,

W. Zhan, L. Sun, D. Wang, H. Shi, A. Clausse, M. Naumann, J. Kummerle, H. Konigshof, C. Stiller, A. de La Fortelleet al., “Interaction dataset: An international, adversarial and cooperative motion dataset in interactive driving scenarios with semantic maps,” arXiv preprint arXiv:1910.03088, 2019

work page arXiv 1910
[46]

Highly accurate and diverse traffic data: The deepscenario open 3d dataset,

O. Dhaouadi, J. Meier, L. Wahl, J. Kaiser, L. Scalerandi, N. Wandelburg, Z. Zhou, N. Berinpanathan, H. Banzhaf, and D. Cremers, “Highly accurate and diverse traffic data: The deepscenario open 3d dataset,” arXiv preprint arXiv:2504.17371, 2025

work page arXiv 2025
[47]

Decoupled Weight Decay Regularization

I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[1] [1]

In [6], AIRL is used, where a discriminator is trained to distinguish real from simulated behavior, assigning higher scores to more realistic samples

address this issue by reconstructing a reward signal from real-world data. In [6], AIRL is used, where a discriminator is trained to distinguish real from simulated behavior, assigning higher scores to more realistic samples. As the goal is to drive as realistically as possible, the output of the discriminator is then used as a reward signal for RL traini...

work page

[2] [2]

In [18], a global map is rasterized and encoded using a CNN

is the only work on learning a scene-centric multi-agent behavior model for closed-loop simulation. In [18], a global map is rasterized and encoded using a CNN. Then, local map features are extracted via Rotated Region of Interest Align and fused with the agent features. Lastly, a joint decoder model, realized as a message passing network, processes all a...

work page

[3] [3]

to reconstruct a surrogate reward signal from real data, given as D={(o 1, a1),(o 2, a2), . . .}. In AIRL, an additional discriminator model Dϕ is trained to distinguish generated from real samples, outputting the probability Dϕ(o, a)∈[0,1] for the observation-action pair being real, i. e., stemming from D. The policy is trained via RL using the surrogate...

work page

[4] [4]

Constant Velocity (CV): A learning-free baseline where agents are assumed to continue moving forward at a constant velocity

work page

[5] [5]

LateFusionMLP[8]: Following [7], [8], [16], this compact agent-centric model consists solely of MLPs and max-pooling operations. We adopt the public implementation [8], replacing its discrete action decoder with ours to support continuous actions and training it within our framework for realistic behavior modeling

work page

[6] [6]

GraphAIRL[6]: A more sophisticated agent-centric model that leverages a vectorized scene representation

work page

[7] [7]

We evaluate two variants: 1) trained with c= 5 , as proposed in [6], and 2) trained with our proposed adaptive reward offset, defined in (3)

and attention-based interaction modeling. We evaluate two variants: 1) trained with c= 5 , as proposed in [6], and 2) trained with our proposed adaptive reward offset, defined in (3)

work page

[8] [8]

Our agent-centric observations include both nearby agents and map elements within the observation radius

Behavior Cloning (BC): A supervised learning variant of our instance-centric approach, trained for 600 epochs by minimizing the negative log-likelihood of expert actions under the predicted action distribution. Our agent-centric observations include both nearby agents and map elements within the observation radius. The start and end points of a vector v a...

work page

[9] [9]

Mixsim: A hierarchical framework for mixed reality traffic simulation,

S. Suo, K. Wong, J. Xu, J. Tu, A. Cui, S. Casas, and R. Urtasun, “Mixsim: A hierarchical framework for mixed reality traffic simulation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 9622–9631

work page 2023

[10] [10]

Sledge: Synthesizing driving environments with generative models and rule-based traffic,

K. Chitta, D. Dauner, and A. Geiger, “Sledge: Synthesizing driving environments with generative models and rule-based traffic,” in European Conference on Computer Vision. Springer, 2024, pp. 57–74

work page 2024

[11] [11]

Learning robust control policies for end-to- end autonomous driving from data-driven simulation,

A. Amini, I. Gilitschenski, J. Phillips, J. Moseyko, R. Banerjee, S. Karaman, and D. Rus, “Learning robust control policies for end-to- end autonomous driving from data-driven simulation,”IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 1143–1150, 2020

work page 2020

[12] [12]

Trafficbots: Towards world models for autonomous driving simulation and motion prediction,

Z. Zhang, A. Liniger, D. Dai, F. Yu, and L. Van Gool, “Trafficbots: Towards world models for autonomous driving simulation and motion prediction,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 1522–1529

work page 2023

[13] [13]

Modeling human driving behavior through generative adversarial imitation learning,

R. Bhattacharyya, B. Wulfe, D. J. Phillips, A. Kuefler, J. Morton, R. Senanayake, and M. J. Kochenderfer, “Modeling human driving behavior through generative adversarial imitation learning,”IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 3, pp. 2874–2887, 2022

work page 2022

[14] [14]

Graph- based adversarial imitation learning for predicting human driving behavior,

F. Konstantinidis, M. Sackmann, U. Hofmann, and C. Stiller, “Graph- based adversarial imitation learning for predicting human driving behavior,” in2024 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2024, pp. 857–864

work page 2024

[15] [15]

Robust autonomy emerges from self-play

M. Cusumano-Towner, D. Hafner, A. Hertzberg, B. Huval, A. Petrenko, E. Vinitsky, E. Wijmans, T. Killian, S. Bowers, O. Seneret al., “Robust autonomy emerges from self-play,”arXiv preprint arXiv:2502.03349, 2025

work page arXiv 2025

[16] [16]

Building reliable sim driving agents by scaling self-play,

D. Cornelisse, A. Pandya, K. Joseph, J. Su ´arez, and E. Vinitsky, “Building reliable sim driving agents by scaling self-play,”arXiv preprint arXiv:2502.14706, 2025

work page arXiv 2025

[17] [17]

ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst

M. Bansal, A. Krizhevsky, and A. Ogale, “Chauffeurnet: Learning to drive by imitating the best and synthesizing the worst,”arXiv preprint arXiv:1812.03079, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[18] [18]

Imitation is not enough: Robustifying imitation with reinforcement learning for challenging driving scenarios,

Y . Lu, J. Fu, G. Tucker, X. Pan, E. Bronstein, R. Roelofs, B. Sapp, B. White, A. Faust, S. Whitesonet al., “Imitation is not enough: Robustifying imitation with reinforcement learning for challenging driving scenarios,” in2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 7553–7560

work page 2023

[19] [19]

Model-free deep reinforcement learning for urban autonomous driving,

J. Chen, B. Yuan, and M. Tomizuka, “Model-free deep reinforcement learning for urban autonomous driving,” in2019 IEEE intelligent transportation systems conference (ITSC). IEEE, 2019, pp. 2765– 2771

work page 2019

[20] [20]

Modeling interaction-aware driving behavior using graph-based representations and multi-agent reinforcement learning,

F. Konstantinidis, M. Sackmann, U. Hofmann, and C. Stiller, “Modeling interaction-aware driving behavior using graph-based representations and multi-agent reinforcement learning,” in2023 IEEE 26th Interna- tional Conference on Intelligent Transportation Systems (ITSC). IEEE, 2023, pp. 1643–1650

work page 2023

[21] [21]

Importance sampling-guided meta-training for intelligent agents in highly interactive environments,

M. Arief, M. Timmerman, J. Li, D. Isele, and M. J. Kochenderfer, “Importance sampling-guided meta-training for intelligent agents in highly interactive environments,”IEEE Robotics and Automation Letters, 2024

work page 2024

[22] [22]

Learning robust rewards with adverse- rial inverse reinforcement learning,

J. Fu, K. Luo, and S. Levine, “Learning robust rewards with adverse- rial inverse reinforcement learning,” inInternational Conference on Learning Representations, 2018

work page 2018

[23] [23]

Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research,

C. Gulino, J. Fu, W. Luo, G. Tucker, E. Bronstein, Y . Lu, J. Harb, X. Pan, Y . Wang, X. Chenet al., “Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research,”Advances in Neural Information Processing Systems, vol. 36, pp. 7730–7742, 2023

work page 2023

[24] [24]

Gpudrive: Data-driven, multi-agent driving simulation at 1 million fps,

S. Kazemkhani, A. Pandya, D. Cornelisse, B. Shacklett, and E. Vinitsky, “Gpudrive: Data-driven, multi-agent driving simulation at 1 million fps,” arXiv preprint arXiv:2408.01584, 2024

work page arXiv 2024

[25] [25]

Imagining the road ahead: Multi-agent trajectory prediction via differentiable simulation,

A. ´Scibior, V . Lioutas, D. Reda, P. Bateni, and F. Wood, “Imagining the road ahead: Multi-agent trajectory prediction via differentiable simulation,” in2021 IEEE International Intelligent Transportation Systems Conference (ITSC), 2021, pp. 720–725

work page 2021

[26] [26]

Trafficsim: Learning to simulate realistic multi-agent behaviors,

S. Suo, S. Regalado, S. Casas, and R. Urtasun, “Trafficsim: Learning to simulate realistic multi-agent behaviors,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10 400–10 409

work page 2021

[27] [27]

Scene transformer: A unified architecture for predicting multiple agent trajectories.arXiv preprint arXiv:2106.08417, 2021

J. Ngiam, B. Caine, V . Vasudevan, Z. Zhang, H.-T. L. Chiang, J. Ling, R. Roelofs, A. Bewley, C. Liu, A. Venugopalet al., “Scene transformer: A unified architecture for predicting multiple agent trajectories,”arXiv preprint arXiv:2106.08417, 2021

work page arXiv 2021

[28] [28]

Simnet: Learning reactive self-driving simulations from real-world observations,

L. Bergamini, Y . Ye, O. Scheel, L. Chen, C. Hu, L. Del Pero, B. Osi´nski, H. Grimmett, and P. Ondruska, “Simnet: Learning reactive self-driving simulations from real-world observations,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 5119–5125

work page 2021

[29] [29]

General lane-changing model mobil for car-following models,

A. Kesting, M. Treiber, and D. Helbing, “General lane-changing model mobil for car-following models,”Transportation Research Record, vol. 1999, no. 1, pp. 86–94, 2007

work page 1999

[30] [30]

Enhanced intelligent driver model to access the impact of driving strategies on traffic capacity,

——, “Enhanced intelligent driver model to access the impact of driving strategies on traffic capacity,”Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 368, no. 1928, pp. 4585–4605, 2010

work page 1928

[31] [31]

Feedback in imitation learning: The three regimes of covariate shift,

J. Spencer, S. Choudhury, A. Venkatraman, B. Ziebart, and J. A. Bagnell, “Feedback in imitation learning: The three regimes of covariate shift,” arXiv preprint arXiv:2102.02872, 2021

work page arXiv 2021

[32] [32]

Modelling two-dimensional driving behaviours at unsignalised intersection using multi-agent imitation learning,

J. Sun and J. Kim, “Modelling two-dimensional driving behaviours at unsignalised intersection using multi-agent imitation learning,” Transportation Research Part C: Emerging Technologies, vol. 165, p. 104702, 2024

work page 2024

[33] [33]

Betail: Behavior transformer adversarial imitation learning from human racing gameplay,

C. Weaver, C. Tang, C. Hao, K. Kawamoto, M. Tomizuka, and W. Zhan, “Betail: Behavior transformer adversarial imitation learning from human racing gameplay,”IEEE Robotics and Automation Letters, 2024

work page 2024

[34] [34]

Narrowing the coordinate-frame gap in behavior prediction models: Distillation for efficient and accurate scene-centric motion forecasting,

D. A. Su, B. Douillard, R. Al-Rfou, C. Park, and B. Sapp, “Narrowing the coordinate-frame gap in behavior prediction models: Distillation for efficient and accurate scene-centric motion forecasting,” in2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 653–659

work page 2022

[35] [35]

Simpl: A simple and efficient multi-agent motion prediction baseline for autonomous driving,

L. Zhang, P. Li, S. Liu, and S. Shen, “Simpl: A simple and efficient multi-agent motion prediction baseline for autonomous driving,”IEEE Robotics and Automation Letters (RA-L), 2024

work page 2024

[36] [36]

Query-centric trajectory prediction,

Z. Zhou, J. Wang, Y .-H. Li, and Y .-K. Huang, “Query-centric trajectory prediction,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 17 863–17 873

work page 2023

[37] [37]

Real-time motion prediction via heterogeneous polyline transformer with relative pose encoding,

Z. Zhang, A. Liniger, C. Sakaridis, F. Yu, and L. V . Gool, “Real-time motion prediction via heterogeneous polyline transformer with relative pose encoding,”Advances in Neural Information Processing Systems, vol. 36, pp. 57 481–57 499, 2023

work page 2023

[38] [38]

Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying,

S. Shi, L. Jiang, D. Dai, and B. Schiele, “Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 5, pp. 3955–3971, 2024

work page 2024

[39] [39]

High-Dimensional Continuous Control Using Generalized Advantage Estimation

J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High- dimensional continuous control using generalized advantage estimation,” arXiv preprint arXiv:1506.02438, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[40] [40]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox- imal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[41] [41]

Vectornet: Encoding hd maps and agent dynamics from vectorized representation,

J. Gao, C. Sun, H. Zhao, Y . Shen, D. Anguelov, C. Li, and C. Schmid, “Vectornet: Encoding hd maps and agent dynamics from vectorized representation,” in2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11 522–11 530

work page 2020

[42] [42]

Film: Visual reasoning with a general conditioning layer,

E. Perez, F. Strub, H. De Vries, V . Dumoulin, and A. Courville, “Film: Visual reasoning with a general conditioning layer,” inProceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018

work page 2018

[43] [43]

Driving with llms: Fusing object- level vector modality for explainable autonomous driving,

L. Chen, O. Sinavski, J. H ¨unermann, A. Karnsund, A. J. Willmott, D. Birch, D. Maund, and J. Shotton, “Driving with llms: Fusing object- level vector modality for explainable autonomous driving,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024

work page 2024

[44] [44]

Perceiver: General perception with iterative attention,

A. Jaegle, F. Gimeno, A. Brock, O. Vinyals, A. Zisserman, and J. Carreira, “Perceiver: General perception with iterative attention,” inInternational conference on machine learning. PMLR, 2021, pp. 4651–4664

work page 2021

[45] [45]

Interaction dataset: An international, adversarial and cooperative motion dataset in interactive driving scenarios with semantic maps,

W. Zhan, L. Sun, D. Wang, H. Shi, A. Clausse, M. Naumann, J. Kummerle, H. Konigshof, C. Stiller, A. de La Fortelleet al., “Interaction dataset: An international, adversarial and cooperative motion dataset in interactive driving scenarios with semantic maps,” arXiv preprint arXiv:1910.03088, 2019

work page arXiv 1910

[46] [46]

Highly accurate and diverse traffic data: The deepscenario open 3d dataset,

O. Dhaouadi, J. Meier, L. Wahl, J. Kaiser, L. Scalerandi, N. Wandelburg, Z. Zhou, N. Berinpanathan, H. Banzhaf, and D. Cremers, “Highly accurate and diverse traffic data: The deepscenario open 3d dataset,” arXiv preprint arXiv:2504.17371, 2025

work page arXiv 2025

[47] [47]

Decoupled Weight Decay Regularization

I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017