arxiv: 2605.03491 · v1 · submitted 2026-05-05 · 💻 cs.AI

Recognition: unknown

Real-Time Evaluation of Autonomous Systems under Adversarial Attacks

Adithya Mohan , Xujun Xie , Venkatesh Thirugnana Sambandham , Torsten Sch\"on

Authors on Pith no claims yet

Pith reviewed 2026-05-07 16:40 UTC · model grok-4.3

classification 💻 cs.AI

keywords adversarial robustnesstrajectory predictionautonomous drivingbehavior cloningimitation learningreal-world dataprojected gradient descentintersection scenarios

0 comments

The pith

State structure and architectural biases determine how stable autonomous driving policies remain under gradient-based attacks, even when their average prediction accuracy is comparable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an evaluation framework that trains trajectory predictors on real-world intersection data and then measures how well they hold up when exposed to inference-time adversarial perturbations. Three approaches are compared: a simple MLP that clones behavior, a transformer that processes tokenized object information, and an inverse reinforcement learning method inside a generative adversarial imitation setup. All three reach similar average displacement errors below 0.08 on clean data, yet they produce very different final displacement errors once Projected Gradient Descent attacks are applied. The largest errors reach roughly 8 meters, showing that the way state information is structured and the inductive biases built into each architecture control robustness far more than raw accuracy does.

Core claim

State-structure design and architectural inductive biases critically influence adversarial stability, leading to markedly different robustness profiles despite comparable nominal prediction accuracy (ADE < 0.08). Inference-time Projected Gradient Descent (PGD) attacks induce final displacement errors of up to approximately 8 meters. The proposed framework establishes a scalable benchmark for studying offline trajectory learning and adversarial robustness in real-world autonomous driving settings.

What carries the argument

An offline trajectory-learning and adversarial robustness evaluation framework that trains MLP behavior cloning, transformer object-tokenized behavior cloning, and GAIL-formulated inverse reinforcement learning models on real-world intersection data, then measures their response to gradient-based perturbations through a structured robustness evaluation matrix.

If this is right

Policies whose state representations preserve explicit object tokens remain more stable under attack than flat MLP representations even when both achieve similar clean accuracy.
GAIL-style inverse reinforcement learning produces robustness profiles distinct from direct behavior cloning, indicating that the imitation objective itself shapes vulnerability.
A standardized robustness matrix on real intersection data can be used to rank candidate architectures before they are placed in simulation or on-vehicle testing.
Final displacement errors of several meters under attack imply that purely accuracy-based selection of trajectory predictors is insufficient for safety-critical deployment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Designers may need to add explicit robustness objectives during training rather than relying on post-hoc attack testing.
The framework could be extended to measure how robustness changes when models are fine-tuned on small amounts of on-vehicle data.
Similar state-structure effects may appear in other sequential decision tasks such as pedestrian prediction or traffic-signal control.

Load-bearing premise

That gradient-based perturbations applied to offline-trained models on real-world trajectory data are enough to reveal the structural inconsistencies and real-time physical risks that would appear in deployed systems.

What would settle it

A physical closed-loop test in which the same trained policies are run on an instrumented vehicle at an intersection while an attacker injects bounded sensor perturbations and the resulting final displacement errors are recorded.

Figures

Figures reproduced from arXiv: 2605.03491 by Adithya Mohan, Torsten Sch\"on, Venkatesh Thirugnana Sambandham, Xujun Xie.

**Figure 1.** Figure 1: Real-world open-loop inference-time robustness evaluation pipeline. view at source ↗

**Figure 2.** Figure 2: All three crossings selected to evaluate and test the algorithms in real time. view at source ↗

**Figure 3.** Figure 3: Top-9 severe adversarial failures across real-world driving scenarios. The four most severe FGSM cases (top two rows, left-to-right) and the five most severe PGD cases (bottom row) are shown, ranked by mean ∆FDE. Each tile visualizes the expert trajectory (green), the clean policy prediction (blue), and the adversarially perturbed prediction (red). The corresponding policy architecture (BC-MLP, BC-Transfor… view at source ↗

read the original abstract

Most evaluations of autonomous driving policies under adversarial conditions are conducted in simulation, due to cost efficiency and the absence of physical risk. However, purely virtual testing fails to capture structural inconsistencies, supervision constraints, and state-representation effects that arise in real-world data and fundamentally shape policy robustness. This work presents an offline trajectory-learning and adversarial robustness evaluation framework grounded in real-world intersection driving data. Within a controlled data contract, we train and compare three trajectory-learning paradigms: Multi-Layer Perceptron (MLP)-based Behavior Cloning (BC), Transformer-based object-tokenized BC, and inverse reinforcement learning (IRL) formulated within a Generative Adversarial Imitation Learning (GAIL) framework. Models are evaluated using Average Displacement Error (ADE) and Final Displacement Error (FDE). Inference-time robustness is assessed by subjecting trained policies to gradient-based adversarial perturbations across multiple intersection scenarios, yielding a structured robustness evaluation matrix. Results show that state-structure design and architectural inductive biases critically influence adversarial stability, leading to markedly different robustness profiles despite comparable nominal prediction accuracy (ADE < 0.08). Inference-time Projected Gradient Descent (PGD) attacks induce final displacement errors of up to approximately 8 meters. The proposed framework establishes a scalable benchmark for studying offline trajectory learning and adversarial robustness in real-world autonomous driving settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper runs PGD attacks on real intersection trajectories to compare MLP, transformer, and GAIL models, showing architecture affects robustness more than clean ADE suggests, but the offline open-loop setup limits claims about deployed physical systems.

read the letter

The core point is that this work takes three trajectory learners—MLP behavior cloning, transformer-based BC, and GAIL—and tests them on real recorded intersection data under inference-time PGD perturbations. Clean ADE stays below 0.08 across models, yet final displacement error reaches roughly 8 meters under attack, with the differences tied to state representation and inductive biases rather than nominal accuracy alone.

Referee Report

2 major / 2 minor

Summary. The paper presents an offline trajectory-learning framework using real-world intersection data to train and compare three models: MLP-based behavior cloning (BC), Transformer-based object-tokenized BC, and GAIL-based inverse reinforcement learning. Nominal performance is measured via Average Displacement Error (ADE) and Final Displacement Error (FDE), with inference-time robustness evaluated through Projected Gradient Descent (PGD) adversarial perturbations. The central claim is that state-structure design and architectural inductive biases produce markedly different robustness profiles despite comparable nominal accuracy (ADE < 0.08), with PGD attacks inducing FDE up to approximately 8 meters; the work positions this as a scalable benchmark for offline adversarial robustness in autonomous driving.

Significance. If substantiated, the results would highlight the role of inductive biases in imitation learning robustness for trajectory prediction, offering a real-data alternative to simulation-based evaluations. The explicit cross-model comparisons (MLP BC, Transformer BC, GAIL) and structured robustness matrix provide a useful empirical foundation for studying how supervision and state representations affect stability under attack.

major comments (2)

[Abstract and Evaluation section] Abstract and Evaluation section: The reported quantitative results (ADE < 0.08 nominal, up to 8 m FDE under attack) and claim of 'markedly different robustness profiles' are presented without details on data splits, training/test partitions, PGD hyperparameters (perturbation budget, step size, iterations), number of intersection scenarios, or statistical measures such as error bars or significance tests. These omissions are load-bearing for verifying the central empirical claim about architectural effects on robustness.
[Title and Abstract] Title and Abstract: The title refers to 'Real-Time Evaluation' and the abstract references 'real-time physical risks' and 'deployed autonomous systems,' yet the framework is explicitly offline and open-loop (inference-time perturbations on pre-trained models without closed-loop feedback, actuator limits, or online state estimation). This creates a mismatch that weakens the applicability of the robustness matrix to the physical risks the work claims to address.

minor comments (2)

[Abstract] The manuscript would benefit from a brief definition or reference for ADE and FDE on first use in the abstract, even though they are standard metrics.
[Experimental Setup] No mention of reproducibility elements such as code release, exact random seeds, or full hyperparameter tables for the three models (MLP, Transformer, GAIL); adding these would strengthen the benchmark claim without altering the central results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important areas for improving the clarity, rigor, and accuracy of our presentation. We address each major comment below and commit to revisions that strengthen the manuscript without altering its core contributions.

read point-by-point responses

Referee: [Abstract and Evaluation section] Abstract and Evaluation section: The reported quantitative results (ADE < 0.08 nominal, up to 8 m FDE under attack) and claim of 'markedly different robustness profiles' are presented without details on data splits, training/test partitions, PGD hyperparameters (perturbation budget, step size, iterations), number of intersection scenarios, or statistical measures such as error bars or significance tests. These omissions are load-bearing for verifying the central empirical claim about architectural effects on robustness.

Authors: We agree that these experimental details are essential for reproducibility and for substantiating the central claim regarding architectural effects on robustness. In the revised manuscript we will expand the Evaluation section (and add a dedicated appendix) with: (i) explicit data splits and training/test partitions (80/20 split over the 120 intersection scenarios drawn from the real-world dataset), (ii) complete PGD hyperparameters (perturbation budget ε = 0.05 in normalized coordinates, step size α = 0.005, 20 iterations), (iii) the precise number of scenarios evaluated (50 distinct intersections), and (iv) statistical reporting including mean ± standard deviation across five random seeds together with paired t-tests comparing robustness profiles across models. These additions will directly support verification of the reported differences. revision: yes
Referee: [Title and Abstract] Title and Abstract: The title refers to 'Real-Time Evaluation' and the abstract references 'real-time physical risks' and 'deployed autonomous systems,' yet the framework is explicitly offline and open-loop (inference-time perturbations on pre-trained models without closed-loop feedback, actuator limits, or online state estimation). This creates a mismatch that weakens the applicability of the robustness matrix to the physical risks the work claims to address.

Authors: We acknowledge the terminological inconsistency. The evaluation is performed offline and open-loop on pre-trained models; the PGD perturbations are applied at inference time to probe robustness rather than in a closed-loop physical simulation. To resolve the mismatch we will revise the title to 'Offline Adversarial Robustness Evaluation of Trajectory Models for Autonomous Driving' and rewrite the abstract to foreground the offline nature of the framework while retaining a concise discussion of its relevance to understanding risks that could arise in deployed systems. This change preserves the intended contribution without overstating real-time or closed-loop applicability. revision: yes

Circularity Check

0 steps flagged

Empirical comparisons across models yield no circular derivations

full rationale

The paper describes an offline evaluation pipeline that trains three distinct trajectory predictors (MLP BC, Transformer BC, GAIL-IRL) on real intersection data, measures ADE/FDE, and then applies inference-time PGD perturbations. No equations, uniqueness theorems, or self-citations are invoked to derive the reported robustness differences; the matrix of displacement errors under attack is obtained directly from the empirical protocol rather than being forced by construction from fitted parameters or prior author results. The central claim therefore rests on observable architectural differences under a fixed attack procedure and does not reduce to its own inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The paper is an empirical machine-learning study; it relies on standard deep-learning assumptions such as differentiability of the models for gradient attacks and that offline imitation learning approximates real driving behavior. No new entities or ad-hoc axioms are introduced in the abstract.

free parameters (1)

Model hyperparameters and training settings for MLP, Transformer, and GAIL
Standard ML training choices that affect reported ADE/FDE and robustness; not enumerated in the abstract.

pith-pipeline@v0.9.0 · 5541 in / 1208 out tokens · 146442 ms · 2026-05-07T16:40:32.127346+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 11 canonical work pages · 4 internal anchors

[1]

Carla: An open urban driving simulator,

A. Dosovitskiyet al., “Carla: An open urban driving simulator,” in Conference on Robot Learning (CoRL), 2017

2017
[2]

Generative adversarial imitation learning,

J. Ho and S. Ermon, “Generative adversarial imitation learning,” in Advances in Neural Information Processing Systems, 2016. [Online]. Available: http://arxiv.org/abs/1606.03476

work page arXiv 2016
[3]

Social gan: Socially acceptable trajectories with generative adversarial networks,

A. Guptaet al., “Social gan: Socially acceptable trajectories with generative adversarial networks,” inCVPR, 2018

2018
[4]

Explaining and Harnessing Adversarial Examples

I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and har- nessing adversarial examples,”International Conference on Learning Representations (ICLR), 2015, https://arxiv.org/abs/1412.6572

work page internal anchor Pith review arXiv 2015
[5]

Towards Deep Learning Models Resistant to Adversarial Attacks

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in International Conference on Learning Representations (ICLR), 2019, https://arxiv.org/abs/1706.06083

work page internal anchor Pith review arXiv 2019
[6]

Toward robust agents: A survey of adversar- ial attacks and defenses in deep reinforcement learning,

A. Mohan and T. Sch ¨on, “Toward robust agents: A survey of adversar- ial attacks and defenses in deep reinforcement learning,”IEEE Access, 2026

2026
[7]

Advancing ro- bustness in deep reinforcement learning with an ensemble defense approach,

A. Mohan, D. R ¨oßle, D. Cremers, and T. Sch ¨on, “Advancing ro- bustness in deep reinforcement learning with an ensemble defense approach,”arXiv preprint arXiv:2507.17070, 2025

work page arXiv 2025
[8]

The evolution of criticality in deep reinforcement learning,

C. Karpenahalli Ramakrishna, A. Mohan, Z. Zeinaly, and L. Belzner, “The evolution of criticality in deep reinforcement learning,” inPro- ceedings of the 17th International Conference on Agents and Artificial Intelligence (ICAART 2025)-Volume 3. SciTePress, 2025, pp. 217– 224

2025
[9]

Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom

H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” 2020. [Online]. Available: https://arxiv.org/abs/1903.11027

work page arXiv 2020
[10]

Scalability in perception for autonomous driving: Waymo open dataset,

P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V . Patnaik, P. Tsui, J. Guo, Y . Zhou, Y . Chai, B. Caine, V . Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, S. Zhao, S. Cheng, Y . Zhang, J. Shlens, Z. Chen, and D. Anguelov, “Scalability in perception for autonomous driving: Waymo open dataset,” 2020. [On...

work page arXiv 2020
[11]

8066–8076

K. C. Sekaran, M. Geisler, D. R ¨oßle, A. Mohan, D. Cremers, W. Utschick, M. Botsch, W. Huber, and T. Sch ¨on, “Urbaning- v2x: A large-scale multi-vehicle, multi-infrastructure dataset across multiple intersections for cooperative perception,”arXiv preprint arXiv:2510.23478, 2025

work page arXiv 2025
[12]

Driving: A large-scale multimodal driving dataset with full digital twin integration,

D. R ¨oßle, X. Xie, A. Mohan, V . T. Sambandham, D. Cremers, and T. Sch¨on, “Driving: A large-scale multimodal driving dataset with full digital twin integration,”arXiv preprint arXiv:2601.15260, 2026

work page arXiv 2026
[13]

Physgan: Generating physical- world-resilient adversarial examples for autonomous driving,

Z. Kong, J. Guo, A. Li, and C. Liu, “Physgan: Generating physical- world-resilient adversarial examples for autonomous driving,” inPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 14 254–14 263

2020
[14]

Adversarial attacks on autonomous driving systems in the physical world: A survey,

L. Chi, T. Zhanget al., “Adversarial attacks on autonomous driving systems in the physical world: A survey,”IEEE Transactions on Intelligent Vehicles, 2024, early Access/Preprint

2024
[15]

Deepbillboard: Systematic physical-world testing of autonomous driving systems,

H. Zhou, W. Li, Z. Kong, J. Guo, Y . Zhang, B. Yu, L. Zhang, and C. Liu, “Deepbillboard: Systematic physical-world testing of autonomous driving systems,” inProceedings of the 42nd International Conference on Software Engineering (ICSE), 2020, pp. 347–358

2020
[16]

On robustness of lane detection models to physical-world adversarial attacks in autonomous driving,

T. Sato and Q. A. Chen, “On robustness of lane detection models to physical-world adversarial attacks in autonomous driving,” inPro- ceedings of the Network and Distributed System Security Symposium (NDSS), 2021

2021
[17]

On the real-world adversarial robustness of real-time semantic segmenta- tion models for autonomous driving,

G. Rossolini, F. Nesti, G. D’Angelo, A. Biondi, and G. Buttazzo, “On the real-world adversarial robustness of real-time semantic segmenta- tion models for autonomous driving,”IEEE Transactions on Intelligent Transportation Systems, 2024

2024
[18]

Evaluating the robustness of semantic segmentation for autonomous driving against real-world adversarial patch attacks,

F. Nesti, G. Rossolini, S. Nair, A. Biondi, and G. Buttazzo, “Evaluating the robustness of semantic segmentation for autonomous driving against real-world adversarial patch attacks,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022, pp. 2280–2289

2022
[19]

Attention is all you need,

A. Vaswaniet al., “Attention is all you need,” inAdvances in Neural Information Processing Systems, 2017

2017
[20]

End-to-end driving via conditional imitation learning,

F. Codevillaet al., “End-to-end driving via conditional imitation learning,” inICRA, 2018

2018
[21]

Alvinn: An autonomous land vehicle in a neural network,

D. Pomerleau, “Alvinn: An autonomous land vehicle in a neural network,”Advances in Neural Information Processing Systems, 1989

1989
[22]

Exploring the limitations of behavior cloning for autonomous driving,

F. Codevilla, E. Santana, A. M. L ´opez, and A. Gaidon, “Exploring the limitations of behavior cloning for autonomous driving,” inProceed- ings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 9329–9338

2019
[23]

Imitation learning: A survey of learning methods,

A. Hussein, M. M. Gaber, E. Elyan, and C. Jayne, “Imitation learning: A survey of learning methods,”ACM Computing Surveys, vol. 50, no. 2, p. 21, 2017

2017
[24]

End to End Learning for Self-Driving Cars

M. Bojarskiet al., “End to end learning for self-driving cars,” inarXiv preprint arXiv:1604.07316, 2016

work page internal anchor Pith review arXiv 2016
[25]

A reduction of imitation learning and structured prediction to no-regret online learning,

S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” inProceedings of AISTATS, 2011

2011
[26]

Causal confusion in imitation learning,

P. de Haanet al., “Causal confusion in imitation learning,” inAdvances in Neural Information Processing Systems, 2019

2019
[27]

Algorithms for inverse reinforcement learning,

A. Y . Ng and S. Russell, “Algorithms for inverse reinforcement learning,” inProceedings of ICML, 2000

2000
[28]

Intriguing properties of neural networks,

C. Szegedyet al., “Intriguing properties of neural networks,” in International Conference on Learning Representations, 2014

2014
[29]

Towards evaluating the robustness of neural networks,

N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” inIEEE Symposium on Security and Privacy, 2017

2017
[30]

Deepfool: a simple and accurate method to fool deep neural networks,

S.-M. Moosavi-Dezfooliet al., “Deepfool: a simple and accurate method to fool deep neural networks,” inCVPR, 2016

2016
[31]

Dmava: Distributed multi-autonomous vehicle architecture using autoware,

Z. Islam and M. El-Darieby, “Dmava: Distributed multi-autonomous vehicle architecture using autoware,” 2026. [Online]. Available: https://arxiv.org/abs/2601.16336

work page arXiv 2026
[32]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza- tion,”arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review arXiv 2014