Forecasting the Past: Gradient-Based Distribution Shift Detection in Trajectory Prediction

Julian Wiederer; Michele De Vita; Vasileios Belagiannis

arxiv: 2604.12425 · v1 · submitted 2026-04-14 · 💻 cs.LG

Forecasting the Past: Gradient-Based Distribution Shift Detection in Trajectory Prediction

Michele De Vita , Julian Wiederer , Vasileios Belagiannis This is my paper

Pith reviewed 2026-05-10 15:35 UTC · model grok-4.3

classification 💻 cs.LG

keywords predictionshiftstrajectorydistributionforecastingdecoderdetectiondistributional

0 comments

The pith

A gradient-based score from a post-hoc decoder trained to forecast the second half of trajectories detects distribution shifts without changing the original prediction model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes training an auxiliary decoder on the self-supervised task of predicting the latter part of observed trajectories from the earlier part. The L2 norm of the gradient of this forecasting loss relative to the decoder's final layer serves as an indicator for when the input distribution has shifted. This method leaves the main trajectory prediction model untouched, preserving its performance, and shows better detection of shifts on the Shifts and Argoverse datasets compared to prior approaches. It also applies to spotting potential collisions early in a motion planner. Readers should care because trajectory predictors in driving systems can fail dangerously under new conditions, and this offers a lightweight way to flag those cases.

Core claim

The central claim is that the L2 norm of the gradient of an auxiliary self-supervised forecasting loss with respect to the decoder's final layer provides an effective score for detecting distribution shifts in trajectory prediction tasks, achieving substantial improvements on benchmark datasets while ensuring no interference with the original model's performance.

What carries the argument

The L2 norm of the gradient of the auxiliary forecasting loss with respect to the decoder's final layer, which acts as a distribution shift score.

Load-bearing premise

The L2 norm of the gradient of the auxiliary forecasting loss reliably indicates distribution shifts that matter for the downstream trajectory prediction task.

What would settle it

If experiments on the Shifts or Argoverse datasets show that the proposed gradient norm score does not outperform existing distribution shift detection methods in terms of detection accuracy or AUROC, the claim would be falsified.

Figures

Figures reproduced from arXiv: 2604.12425 by Julian Wiederer, Michele De Vita, Vasileios Belagiannis.

**Figure 2.** Figure 2: Overview of our post-hoc gradient-based distribution [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative example of our gradient-based distribution shift detection method. The figure shows trajectory samples from in [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Highway simulator scenarios depicting a merge crash, a roundabout crash, and normal roundabout navigation. We mark the start [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 6.** Figure 6: Gradient Distribution for In-Distribution vs. Out-Of [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Kernel density estimation (KDE) [8] of the last-layer gradients in the Highway environment [21] for the intersection driving task. We observed very distinct gradients between ID on OOD samples, leading to almost perfect collision detection. landscape. On Argoverse, OOD samples often show lower gradient norms than ID samples. We hypothesize from our experiments that this failure is due to training failure … view at source ↗

**Figure 9.** Figure 9: Gradient vs loss-based OOD detection on the Shifts [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗

read the original abstract

Trajectory prediction models often fail in real-world automated driving due to distributional shifts between training and test conditions. Such distributional shifts, whether behavioural or environmental, pose a critical risk by causing the model to make incorrect forecasts in unfamiliar situations. We propose a self-supervised method that trains a decoder in a post-hoc fashion on the self-supervised task of forecasting the second half of observed trajectories from the first half. The L2 norm of the gradient of this forecasting loss with respect to the decoder's final layer defines a score to identify distribution shifts. Our approach, first, does not affect the trajectory prediction model, ensuring no interference with original prediction performance and second, demonstrates substantial improvements on distribution shift detection for trajectory prediction on the Shifts and Argoverse datasets. Moreover, we show that this method can also be used to early detect collisions of a deep Q-Network motion planner in the Highway simulator. Source code is available at https://github.com/Michedev/forecasting-the-past.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A non-intrusive post-hoc decoder using gradient norms on half-trajectory forecasting gives a workable shift detector for trajectory predictors, though the link to actual prediction errors still needs checking.

read the letter

This paper gives a non-intrusive way to spot distribution shifts in trajectory predictors by training a decoder after the fact on forecasting the second half of a trajectory from the first half, then using the L2 norm of the loss gradient with respect to the decoder's final layer as the detection score. The original predictor stays untouched, which avoids any interference with its performance. They also apply the same score to early collision detection in a DQN planner on the Highway simulator. Code is released, which is useful for replication. The construction is straightforward and not a direct copy of prior shift-detection tricks in the cited work, so the specific auxiliary task plus gradient-norm scoring counts as the main novelty. It is tested on Shifts and Argoverse, with claims of substantial detection gains. That combination of non-intrusiveness and public code makes the idea easy to try in practice for driving applications. The main soft spot is whether the auxiliary gradient norm reliably flags the shifts that actually degrade the main predictor's ADE or FDE. The decoder is fit separately on in-distribution data, so its sensitivity could diverge from the features the original model depends on. Shifts that matter for the predictor but leave the decoder's last layer stable might be missed, while unrelated sensitivities could trigger false alarms. The abstract asserts clear improvements but does not include the numbers, baselines, or ablations here, so the strength of that alignment is hard to judge without the full results. This is for researchers working on monitoring and robustness for autonomous driving trajectory models, especially those already using datasets like Argoverse. A reader who needs a lightweight, model-agnostic detection tool would find it worth reading. It deserves a serious referee because the method is concrete, the application is relevant to safety, and the code allows direct inspection. I would send it to peer review.

Referee Report

3 major / 2 minor

Summary. The paper introduces a post-hoc self-supervised approach for distribution shift detection in trajectory prediction models. An auxiliary decoder is trained to forecast the second half of a trajectory from the first half; the L2 norm of the gradient of this auxiliary loss with respect to the decoder's final layer is used as a shift-detection score. The method is asserted to leave the original predictor untouched and to yield substantial improvements on the Shifts and Argoverse datasets, with an additional demonstration on early collision detection for a DQN planner in the Highway simulator.

Significance. If the auxiliary gradient norm reliably flags shifts that degrade the main predictor, the approach would supply a lightweight, non-intrusive monitoring tool for deployed trajectory models in autonomous driving. The post-hoc, self-supervised construction avoids any interference with original performance and re-uses existing trajectory data, which are practical advantages. Source-code release further supports reproducibility.

major comments (3)

[Experiments] The central claim that the L2 gradient norm on the auxiliary forecasting loss detects shifts relevant to the original predictor rests on an untested alignment between auxiliary sensitivity and main-task failure modes. No analysis is provided showing correlation between high detection scores and elevated ADE/FDE on the untouched trajectory model under the reported shifts (Experiments section).
[Section 4] Quantitative results for the claimed 'substantial improvements' on Shifts and Argoverse are not summarized with baselines, error bars, or ablation details on the auxiliary task and chosen gradient layer, making it impossible to assess whether the gains are robust or merely reflect the auxiliary decoder's own sensitivity (Section 4).
[Application to DQN planner] The extension to early collision detection in the DQN Highway planner requires clarification on how the auxiliary decoder and gradient score are adapted to the planner's state representation and what quantitative metric defines 'early' detection (final application paragraph).

minor comments (2)

[Abstract] Abstract: the phrasing 'first, does not affect... and second, demonstrates' is grammatically awkward and should be restructured for clarity.
[Method] Notation for the auxiliary loss and the exact parameters θ (decoder final layer) should be defined explicitly in the method section to avoid ambiguity when readers reproduce the gradient computation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and outline revisions to strengthen the presentation of our results and claims.

read point-by-point responses

Referee: [Experiments] The central claim that the L2 gradient norm on the auxiliary forecasting loss detects shifts relevant to the original predictor rests on an untested alignment between auxiliary sensitivity and main-task failure modes. No analysis is provided showing correlation between high detection scores and elevated ADE/FDE on the untouched trajectory model under the reported shifts (Experiments section).

Authors: We agree that an explicit analysis correlating the auxiliary gradient-norm scores with degradation in the original predictor's ADE/FDE would more directly substantiate the relevance of the detected shifts. The current validation relies on improved AUROC for shift detection on the Shifts and Argoverse benchmarks, where the shifts are known to impact trajectory prediction performance. To address this point, we will add a new analysis (including scatter plots or correlation coefficients) in the Experiments section of the revised manuscript that quantifies the relationship between high detection scores and elevated prediction errors on the frozen main model. revision: yes
Referee: [Section 4] Quantitative results for the claimed 'substantial improvements' on Shifts and Argoverse are not summarized with baselines, error bars, or ablation details on the auxiliary task and chosen gradient layer, making it impossible to assess whether the gains are robust or merely reflect the auxiliary decoder's own sensitivity (Section 4).

Authors: The results in Section 4 compare our method against multiple distribution-shift detection baselines and report AUROC improvements. However, we acknowledge that the presentation would benefit from error bars across random seeds and expanded ablations on auxiliary-task hyperparameters and gradient-layer selection. In the revision we will include these elements (error bars from at least five runs and additional ablation tables) to demonstrate robustness and rule out sensitivity artifacts. revision: yes
Referee: [Application to DQN planner] The extension to early collision detection in the DQN Highway planner requires clarification on how the auxiliary decoder and gradient score are adapted to the planner's state representation and what quantitative metric defines 'early' detection (final application paragraph).

Authors: The auxiliary decoder is trained on the same state trajectories used by the DQN planner (position, velocity, and heading sequences extracted from the simulator). The self-supervised forecasting task and gradient-norm computation are applied identically to the trajectory-prediction case. 'Early' detection is quantified by the number of timesteps before a collision at which the score exceeds a threshold, together with the resulting reduction in collision rate when the score triggers a safety intervention. We will expand the final paragraph with these details and add a short table of lead-time and collision-rate metrics in the revised manuscript. revision: partial

Circularity Check

0 steps flagged

No circularity: auxiliary gradient norm defined independently of main-task performance

full rationale

The paper defines its distribution-shift score directly as the L2 norm of the gradient of a post-hoc auxiliary self-supervised forecasting loss (second-half trajectory from first half) with respect to the decoder's final layer. This construction is independent of the original trajectory predictor's ADE/FDE or any fitted parameters from the main task. No equations reduce the score to a self-referential quantity, a fitted input renamed as prediction, or a self-citation chain. The method is explicitly post-hoc and non-interfering. Evaluation on Shifts and Argoverse is empirical comparison, not a derivation that collapses to the inputs by construction. This is the normal non-circular case.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard machine-learning assumptions about the feasibility of self-supervised training on trajectory splits and the informativeness of gradients for shift detection; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Observed trajectories can be split into first and second halves that support meaningful self-supervised forecasting.
Invoked in the definition of the auxiliary task.

pith-pipeline@v0.9.0 · 5470 in / 1213 out tokens · 47203 ms · 2026-05-10T15:35:03.286900+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 2 internal anchors

[1]

Curb Your Attention: Causal Attention Gating for Robust Trajectory Prediction in Autonomous Driving,

Ehsan Ahmadi, Ray Mercurius, Soheil Alizadeh, Kasra Rezaee, and Amir Rasouli. Curb your attention: Causal at- tention gating for robust trajectory prediction in autonomous driving.arXiv preprint arXiv:2410.07191, 2024. 4

work page internal anchor Pith review arXiv 2024
[2]

Adapt: Efficient multi-agent trajectory prediction with adaptation

G ¨orkay Aydemir, Adil Kaan Akan, and Fatma G¨uney. Adapt: Efficient multi-agent trajectory prediction with adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 8295–8305, 2023. 2

work page 2023
[3]

Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Gi- ancarlo Baldan, and Oscar Beijbom

Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Gi- ancarlo Baldan, and Oscar Beijbom. nuscenes: A multi- modal dataset for autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 2

work page 2020
[4]

Livingston McPherson, and Kather- ine Driggs-Campbell

Neeloy Chakraborty, Aamir Hasan, Shuijing Liu, Tianchen Ji, Weihang Liang, D. Livingston McPherson, and Kather- ine Driggs-Campbell. Structural attention-based recurrent variational autoencoder for highway vehicle anomaly de- tection. InProceedings of the 2023 International Confer- ence on Autonomous Agents and Multiagent Systems, page 1125–1134, Richland...

work page 2023
[5]

Argo- verse: 3d tracking and forecasting with rich maps

Ming-Fang Chang, John Lambert, Patsorn Sangkloy, Jag- jeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, and James Hays. Argo- verse: 3d tracking and forecasting with rich maps. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 2

work page 2019
[6]

Argov- erse: 3d tracking and forecasting with rich maps

Ming-Fang Chang, John W Lambert, Patsorn Sangkloy, Jag- jeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, and James Hays. Argov- erse: 3d tracking and forecasting with rich maps. InConfer- ence on Computer Vision and Pattern Recognition (CVPR),

work page
[7]

S2tnet: Spatio-temporal transformer networks for trajectory predic- tion in autonomous driving

Weihuang Chen, Fangfang Wang, and Hongbin Sun. S2tnet: Spatio-temporal transformer networks for trajectory predic- tion in autonomous driving. InProceedings of The 13th Asian Conference on Machine Learning, pages 454–469. PMLR, 2021. 2

work page 2021
[8]

A tutorial on kernel density estimation and recent advances.Biostatistics & Epidemiology, 1(1):161– 187, 2017

Yen-Chi Chen. A tutorial on kernel density estimation and recent advances.Biostatistics & Epidemiology, 1(1):161– 187, 2017. 7

work page 2017
[9]

Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks

Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and An- drew Rabinovich. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. InIn- ternational conference on machine learning, pages 794–803. PMLR, 2018. 3

work page 2018
[10]

End-to-end driving via conditional imitation learning

Felipe Codevilla, Matthias M ¨uller, Antonio L ´opez, Vladlen Koltun, and Alexey Dosovitskiy. End-to-end driving via conditional imitation learning. In2018 IEEE international conference on robotics and automation (ICRA), pages 4693–

work page
[11]

Grood: Gradient-aware out-of- distribution detection.Transactions on Machine Learning Research, 2024

Mostafa ElAraby, Sabyasachi Sahoo, Yann Pequignot, Paul Novello, and Liam Paull. Grood: Gradient-aware out-of- distribution detection.Transactions on Machine Learning Research, 2024. 4

work page 2024
[12]

Qi, Yin Zhou, Zoey Yang, Aur ´elien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander Mc- Cauley, Jonathon Shlens, and Dragomir Anguelov

Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Ben Sapp, Charles R. Qi, Yin Zhou, Zoey Yang, Aur ´elien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander Mc- Cauley, Jonathon Shlens, and Dragomir Anguelov. Large scale interactive motion forecasting for autonomous driv- ing: The waymo open motion...

work page 2021
[13]

Unitraj: A unified framework for scalable vehi- cle trajectory prediction

Lan Feng, Mohammadhossein Bahari, Kaouther Mes- saoud Ben Amor, ´Eloi Zablocki, Matthieu Cord, and Alexan- dre Alahi. Unitraj: A unified framework for scalable vehi- cle trajectory prediction. InComputer Vision – ECCV 2024, pages 106–123, Cham, 2025. Springer Nature Switzerland. 2

work page 2024
[14]

Can au- tonomous vehicles identify, recover from, and adapt to dis- tribution shifts? InInternational Conference on Machine Learning (ICML), 2020

Angelos Filos, Panagiotis Tigas, Rowan McAllister, Nicholas Rhinehart, Sergey Levine, and Yarin Gal. Can au- tonomous vehicles identify, recover from, and adapt to dis- tribution shifts? InInternational Conference on Machine Learning (ICML), 2020. 2

work page 2020
[15]

Dropout as a bayesian approximation: Representing model uncertainty in deep learning

Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. Ininternational conference on machine learning, pages 1050–1059. PMLR, 2016. 2

work page 2016
[16]

Multi- transmotion: Pre-trained model for human motion predic- tion

Yang Gao, Po-Chien Luan, and Alexandre Alahi. Multi- transmotion: Pre-trained model for human motion predic- tion. In8th Annual Conference on Robot Learning, 2024. 1

work page 2024
[17]

Uncertainty-aware likelihood ratio estimation for pixel- wise out-of-distribution detection

Marc H ¨olle, Walter Kellermann, and Vasileios Belagian- nis. Uncertainty-aware likelihood ratio estimation for pixel- wise out-of-distribution detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 772–782, 2025. 2

work page 2025
[18]

Heatmap- based out-of-distribution detection

Julia Hornauer and Vasileios Belagiannis. Heatmap- based out-of-distribution detection. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2603–2612, 2023. 1

work page 2023
[19]

On the impor- tance of gradients for detecting distributional shifts in the wild.Advances in Neural Information Processing Systems, 34:677–689, 2021

Rui Huang, Andrew Geng, and Yixuan Li. On the impor- tance of gradients for detecting distributional shifts in the wild.Advances in Neural Information Processing Systems, 34:677–689, 2021. 2, 3

work page 2021
[20]

Simple and scalable predictive uncertainty estima- tion using deep ensembles.Advances in neural information processing systems, 30, 2017

Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estima- tion using deep ensembles.Advances in neural information processing systems, 30, 2017. 2

work page 2017
[21]

An environment for autonomous driving decision-making.https://github.com/eleurent/ highway-env, 2018

Edouard Leurent. An environment for autonomous driving decision-making.https://github.com/eleurent/ highway-env, 2018. 7

work page 2018
[22]

Difftad: Denoising diffusion prob- abilistic models for vehicle trajectory anomaly detection

Chaoneng Li, Guanwen Feng, Yunan Li, Ruyi Liu, Qiguang Miao, and Liang Chang. Difftad: Denoising diffusion prob- abilistic models for vehicle trajectory anomaly detection. Knowledge-Based Systems, 286:111387, 2024. 2

work page 2024
[23]

Shiyu Liang, Yixuan Li, and R. Srikant. Enhancing the re- liability of out-of-distribution image detection in neural net- works. InInternational Conference on Learning Represen- tations, 2018. 2

work page 2018
[24]

Dyttp: Trajectory predic- tion with normalization-free transformers, 2025

Yunxiang Liu and Hongkuo Niu. Dyttp: Trajectory predic- tion with normalization-free transformers, 2025. 2

work page 2025
[25]

Shifts: A dataset of real distributional shift across multiple large-scale tasks.arXiv preprint arXiv:2107.07455, 2021

Andrey Malinin, Neil Band, German Chesnokov, Yarin Gal, Mark JF Gales, Alexey Noskov, Andrey Ploskonosov, Li- udmila Prokhorenkova, Ivan Provilkov, Vatsal Raina, et al. Shifts: A dataset of real distributional shift across multiple large-scale tasks.arXiv preprint arXiv:2107.07455, 2021. 2, 4, 5, 6, 7, 8

work page arXiv 2021
[26]

Evidential uncertainty estima- tion for multi-modal trajectory prediction.arXiv preprint arXiv:2503.05274, 2025

Sajad Marvi, Christoph Rist, Julian Schmidt, Julian Jor- dan, and Abhinav Valada. Evidential uncertainty estima- tion for multi-modal trajectory prediction.arXiv preprint arXiv:2503.05274, 2025. 2

work page arXiv 2025
[27]

Vt-former: An exploratory study on vehicle trajectory prediction for highway surveil- lance through graph isomorphism and transformer

Armin Danesh Pazho, Ghazal Alinezhad Noghre, Vinit Katariya, and Hamed Tabkhi. Vt-former: An exploratory study on vehicle trajectory prediction for highway surveil- lance through graph isomorphism and transformer. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 5651– 5662, 2024. 2

work page 2024
[28]

Stable- baselines3: Reliable reinforcement learning implementa- tions.Journal of Machine Learning Research, 22(268):1–8,

Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kan- ervisto, Maximilian Ernestus, and Noah Dormann. Stable- baselines3: Reliable reinforcement learning implementa- tions.Journal of Machine Learning Research, 22(268):1–8,

work page
[29]

Deep imitative models for flexible inference, planning, and control

Nicholas Rhinehart, Rowan McAllister, and Sergey Levine. Deep imitative models for flexible inference, planning, and control. InInternational Conference on Learning Represen- tations, 2020. 2, 4, 5

work page 2020
[30]

Learning representations by back-propagating er- rors.nature, 323(6088):533–536, 1986

David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by back-propagating er- rors.nature, 323(6088):533–536, 1986. 4

work page 1986
[31]

Meat: Maneuver extraction from agent trajectories

Julian Schmidt, Julian Jordan, David Raba, Tobias Welz, and Klaus Dietmayer. Meat: Maneuver extraction from agent trajectories. In2022 IEEE Intelligent Vehicles Symposium (IV), pages 1810–1816. IEEE, 2022. 4

work page 2022
[32]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Rad- ford, and Oleg Klimov. Proximal policy optimization algo- rithms.arXiv preprint arXiv:1707.06347, 2017. 7, 8

work page internal anchor Pith review Pith/arXiv arXiv 2017
[33]

Safeshift: Safety- informed distribution shifts for robust trajectory prediction in autonomous driving

Benjamin Stoler, Ingrid Navarro, Meghdeep Jana, Soonmin Hwang, Jonathan Francis, and Jean Oh. Safeshift: Safety- informed distribution shifts for robust trajectory prediction in autonomous driving. In2024 IEEE Intelligent Vehicles Symposium (IV), pages 1179–1186, 2024. 5

work page 2024
[34]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 2, 5

work page 2017
[35]

Jointmotion: Joint self-supervision for joint mo- tion prediction

Royden Wagner, Omer Sahin Tas, Marvin Klemp, and Carlos Fernandez. Jointmotion: Joint self-supervision for joint mo- tion prediction. In8th Annual Conference on Robot Learn- ing, 2024. 2

work page 2024
[36]

Anomaly detection in multi- agent trajectories for automated driving

Julian Wiederer, Arij Bouazizi, Marco Troina, Ulrich Kres- sel, and Vasileios Belagiannis. Anomaly detection in multi- agent trajectories for automated driving. InConference on Robot Learning, 2021. 1

work page 2021
[37]

Anomaly detection in multi- agent trajectories for automated driving

Julian Wiederer, Arij Bouazizi, Marco Troina, Ulrich Kres- sel, and Vasileios Belagiannis. Anomaly detection in multi- agent trajectories for automated driving. InProceedings of the 5th Conference on Robot Learning, pages 1223–1233. PMLR, 2022. 2, 4

work page 2022
[38]

Joint out-of-distribution detection and uncertainty estimation for trajectory predic- tion

Julian Wiederer, Julian Schmidt, Ulrich Kressel, Klaus Diet- mayer, and Vasileios Belagiannis. Joint out-of-distribution detection and uncertainty estimation for trajectory predic- tion. In2023 IEEE/RSJ International Conference on Intelli- gent Robots and Sytems (IROS), 2023. 2, 4, 5

work page 2023
[39]

Argoverse 2: Next generation datasets for self-driving perception and fore- casting, 2023

Benjamin Wilson, William Qi, Tanmay Agarwal, John Lam- bert, Jagjeet Singh, Siddhesh Khandelwal, Bowen Pan, Rat- nesh Kumar, Andrew Hartnett, Jhony Kaesemodel Pontes, Deva Ramanan, Peter Carr, and James Hays. Argoverse 2: Next generation datasets for self-driving perception and fore- casting, 2023. 2

work page 2023
[40]

Improving out-of-distribution generalization of trajectory prediction for autonomous driv- ing via polynomial representations

Yue Yao, Shengchao Yan, Daniel Goehring, Wolfram Bur- gard, and Joerg Reichardt. Improving out-of-distribution generalization of trajectory prediction for autonomous driv- ing via polynomial representations. In2024 IEEE/RSJ In- ternational Conference on Intelligent Robots and Systems (IROS), pages 488–495, 2024. 2, 4

work page 2024
[41]

Agents-llm: Augmenta- tive generation of challenging traffic scenarios with an agen- tic llm framework

Yu Yao, Salil Bhatnagar, Markus Mazzola, Vasileios Belagiannis, Igor Gilitschenski, Luigi Palmieri, Simon Razniewski, and Marcel Hallgarten. Agents-llm: Augmenta- tive generation of challenging traffic scenarios with an agen- tic llm framework. In2025 IEEE/RSJ International Con- ference on Intelligent Robots and Systems (IROS), pages 18400–18407, 2025. 2

work page 2025
[42]

INTERACTION dataset: An international, adversarial and cooperative motion dataset in interactive driving scenarios with semantic maps,

Wei Zhan, Liting Sun, Di Wang, Haojie Shi, Aubrey Clausse, Maximilian Naumann, Julius K ¨ummerle, Hendrik K¨onigshof, Christoph Stiller, Arnaud de La Fortelle, and Masayoshi Tomizuka. INTERACTION Dataset: An IN- TERnational, Adversarial and Cooperative moTION Dataset in Interactive Driving Scenarios with Semantic Maps. arXiv:1910.03088 [cs, eess], 2019. 2

work page arXiv 1910
[43]

Gradient rectification for robust calibration under distribu- tion shift.arXiv preprint arXiv:2508.19830, 2025

Yilin Zhang, Cai Xu, You Wu, Ziyu Guan, and Wei Zhao. Gradient rectification for robust calibration under distribu- tion shift.arXiv preprint arXiv:2508.19830, 2025. 2

work page arXiv 2025
[44]

Hivt: Hierarchical vector transformer for multi-agent motion prediction

Zikang Zhou, Luyao Ye, Jianping Wang, Kui Wu, and Ke- jie Lu. Hivt: Hierarchical vector transformer for multi-agent motion prediction. In2022 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 8813– 8823, 2022. 2, 4, 7

work page 2022

[1] [1]

Curb Your Attention: Causal Attention Gating for Robust Trajectory Prediction in Autonomous Driving,

Ehsan Ahmadi, Ray Mercurius, Soheil Alizadeh, Kasra Rezaee, and Amir Rasouli. Curb your attention: Causal at- tention gating for robust trajectory prediction in autonomous driving.arXiv preprint arXiv:2410.07191, 2024. 4

work page internal anchor Pith review arXiv 2024

[2] [2]

Adapt: Efficient multi-agent trajectory prediction with adaptation

G ¨orkay Aydemir, Adil Kaan Akan, and Fatma G¨uney. Adapt: Efficient multi-agent trajectory prediction with adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 8295–8305, 2023. 2

work page 2023

[3] [3]

Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Gi- ancarlo Baldan, and Oscar Beijbom

Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Gi- ancarlo Baldan, and Oscar Beijbom. nuscenes: A multi- modal dataset for autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 2

work page 2020

[4] [4]

Livingston McPherson, and Kather- ine Driggs-Campbell

Neeloy Chakraborty, Aamir Hasan, Shuijing Liu, Tianchen Ji, Weihang Liang, D. Livingston McPherson, and Kather- ine Driggs-Campbell. Structural attention-based recurrent variational autoencoder for highway vehicle anomaly de- tection. InProceedings of the 2023 International Confer- ence on Autonomous Agents and Multiagent Systems, page 1125–1134, Richland...

work page 2023

[5] [5]

Argo- verse: 3d tracking and forecasting with rich maps

Ming-Fang Chang, John Lambert, Patsorn Sangkloy, Jag- jeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, and James Hays. Argo- verse: 3d tracking and forecasting with rich maps. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 2

work page 2019

[6] [6]

Argov- erse: 3d tracking and forecasting with rich maps

Ming-Fang Chang, John W Lambert, Patsorn Sangkloy, Jag- jeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, and James Hays. Argov- erse: 3d tracking and forecasting with rich maps. InConfer- ence on Computer Vision and Pattern Recognition (CVPR),

work page

[7] [7]

S2tnet: Spatio-temporal transformer networks for trajectory predic- tion in autonomous driving

Weihuang Chen, Fangfang Wang, and Hongbin Sun. S2tnet: Spatio-temporal transformer networks for trajectory predic- tion in autonomous driving. InProceedings of The 13th Asian Conference on Machine Learning, pages 454–469. PMLR, 2021. 2

work page 2021

[8] [8]

A tutorial on kernel density estimation and recent advances.Biostatistics & Epidemiology, 1(1):161– 187, 2017

Yen-Chi Chen. A tutorial on kernel density estimation and recent advances.Biostatistics & Epidemiology, 1(1):161– 187, 2017. 7

work page 2017

[9] [9]

Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks

Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and An- drew Rabinovich. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. InIn- ternational conference on machine learning, pages 794–803. PMLR, 2018. 3

work page 2018

[10] [10]

End-to-end driving via conditional imitation learning

Felipe Codevilla, Matthias M ¨uller, Antonio L ´opez, Vladlen Koltun, and Alexey Dosovitskiy. End-to-end driving via conditional imitation learning. In2018 IEEE international conference on robotics and automation (ICRA), pages 4693–

work page

[11] [11]

Grood: Gradient-aware out-of- distribution detection.Transactions on Machine Learning Research, 2024

Mostafa ElAraby, Sabyasachi Sahoo, Yann Pequignot, Paul Novello, and Liam Paull. Grood: Gradient-aware out-of- distribution detection.Transactions on Machine Learning Research, 2024. 4

work page 2024

[12] [12]

Qi, Yin Zhou, Zoey Yang, Aur ´elien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander Mc- Cauley, Jonathon Shlens, and Dragomir Anguelov

Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Ben Sapp, Charles R. Qi, Yin Zhou, Zoey Yang, Aur ´elien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander Mc- Cauley, Jonathon Shlens, and Dragomir Anguelov. Large scale interactive motion forecasting for autonomous driv- ing: The waymo open motion...

work page 2021

[13] [13]

Unitraj: A unified framework for scalable vehi- cle trajectory prediction

Lan Feng, Mohammadhossein Bahari, Kaouther Mes- saoud Ben Amor, ´Eloi Zablocki, Matthieu Cord, and Alexan- dre Alahi. Unitraj: A unified framework for scalable vehi- cle trajectory prediction. InComputer Vision – ECCV 2024, pages 106–123, Cham, 2025. Springer Nature Switzerland. 2

work page 2024

[14] [14]

Can au- tonomous vehicles identify, recover from, and adapt to dis- tribution shifts? InInternational Conference on Machine Learning (ICML), 2020

Angelos Filos, Panagiotis Tigas, Rowan McAllister, Nicholas Rhinehart, Sergey Levine, and Yarin Gal. Can au- tonomous vehicles identify, recover from, and adapt to dis- tribution shifts? InInternational Conference on Machine Learning (ICML), 2020. 2

work page 2020

[15] [15]

Dropout as a bayesian approximation: Representing model uncertainty in deep learning

Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. Ininternational conference on machine learning, pages 1050–1059. PMLR, 2016. 2

work page 2016

[16] [16]

Multi- transmotion: Pre-trained model for human motion predic- tion

Yang Gao, Po-Chien Luan, and Alexandre Alahi. Multi- transmotion: Pre-trained model for human motion predic- tion. In8th Annual Conference on Robot Learning, 2024. 1

work page 2024

[17] [17]

Uncertainty-aware likelihood ratio estimation for pixel- wise out-of-distribution detection

Marc H ¨olle, Walter Kellermann, and Vasileios Belagian- nis. Uncertainty-aware likelihood ratio estimation for pixel- wise out-of-distribution detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 772–782, 2025. 2

work page 2025

[18] [18]

Heatmap- based out-of-distribution detection

Julia Hornauer and Vasileios Belagiannis. Heatmap- based out-of-distribution detection. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2603–2612, 2023. 1

work page 2023

[19] [19]

On the impor- tance of gradients for detecting distributional shifts in the wild.Advances in Neural Information Processing Systems, 34:677–689, 2021

Rui Huang, Andrew Geng, and Yixuan Li. On the impor- tance of gradients for detecting distributional shifts in the wild.Advances in Neural Information Processing Systems, 34:677–689, 2021. 2, 3

work page 2021

[20] [20]

Simple and scalable predictive uncertainty estima- tion using deep ensembles.Advances in neural information processing systems, 30, 2017

Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estima- tion using deep ensembles.Advances in neural information processing systems, 30, 2017. 2

work page 2017

[21] [21]

An environment for autonomous driving decision-making.https://github.com/eleurent/ highway-env, 2018

Edouard Leurent. An environment for autonomous driving decision-making.https://github.com/eleurent/ highway-env, 2018. 7

work page 2018

[22] [22]

Difftad: Denoising diffusion prob- abilistic models for vehicle trajectory anomaly detection

Chaoneng Li, Guanwen Feng, Yunan Li, Ruyi Liu, Qiguang Miao, and Liang Chang. Difftad: Denoising diffusion prob- abilistic models for vehicle trajectory anomaly detection. Knowledge-Based Systems, 286:111387, 2024. 2

work page 2024

[23] [23]

Shiyu Liang, Yixuan Li, and R. Srikant. Enhancing the re- liability of out-of-distribution image detection in neural net- works. InInternational Conference on Learning Represen- tations, 2018. 2

work page 2018

[24] [24]

Dyttp: Trajectory predic- tion with normalization-free transformers, 2025

Yunxiang Liu and Hongkuo Niu. Dyttp: Trajectory predic- tion with normalization-free transformers, 2025. 2

work page 2025

[25] [25]

Shifts: A dataset of real distributional shift across multiple large-scale tasks.arXiv preprint arXiv:2107.07455, 2021

Andrey Malinin, Neil Band, German Chesnokov, Yarin Gal, Mark JF Gales, Alexey Noskov, Andrey Ploskonosov, Li- udmila Prokhorenkova, Ivan Provilkov, Vatsal Raina, et al. Shifts: A dataset of real distributional shift across multiple large-scale tasks.arXiv preprint arXiv:2107.07455, 2021. 2, 4, 5, 6, 7, 8

work page arXiv 2021

[26] [26]

Evidential uncertainty estima- tion for multi-modal trajectory prediction.arXiv preprint arXiv:2503.05274, 2025

Sajad Marvi, Christoph Rist, Julian Schmidt, Julian Jor- dan, and Abhinav Valada. Evidential uncertainty estima- tion for multi-modal trajectory prediction.arXiv preprint arXiv:2503.05274, 2025. 2

work page arXiv 2025

[27] [27]

Vt-former: An exploratory study on vehicle trajectory prediction for highway surveil- lance through graph isomorphism and transformer

Armin Danesh Pazho, Ghazal Alinezhad Noghre, Vinit Katariya, and Hamed Tabkhi. Vt-former: An exploratory study on vehicle trajectory prediction for highway surveil- lance through graph isomorphism and transformer. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 5651– 5662, 2024. 2

work page 2024

[28] [28]

Stable- baselines3: Reliable reinforcement learning implementa- tions.Journal of Machine Learning Research, 22(268):1–8,

Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kan- ervisto, Maximilian Ernestus, and Noah Dormann. Stable- baselines3: Reliable reinforcement learning implementa- tions.Journal of Machine Learning Research, 22(268):1–8,

work page

[29] [29]

Deep imitative models for flexible inference, planning, and control

Nicholas Rhinehart, Rowan McAllister, and Sergey Levine. Deep imitative models for flexible inference, planning, and control. InInternational Conference on Learning Represen- tations, 2020. 2, 4, 5

work page 2020

[30] [30]

Learning representations by back-propagating er- rors.nature, 323(6088):533–536, 1986

David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by back-propagating er- rors.nature, 323(6088):533–536, 1986. 4

work page 1986

[31] [31]

Meat: Maneuver extraction from agent trajectories

Julian Schmidt, Julian Jordan, David Raba, Tobias Welz, and Klaus Dietmayer. Meat: Maneuver extraction from agent trajectories. In2022 IEEE Intelligent Vehicles Symposium (IV), pages 1810–1816. IEEE, 2022. 4

work page 2022

[32] [32]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Rad- ford, and Oleg Klimov. Proximal policy optimization algo- rithms.arXiv preprint arXiv:1707.06347, 2017. 7, 8

work page internal anchor Pith review Pith/arXiv arXiv 2017

[33] [33]

Safeshift: Safety- informed distribution shifts for robust trajectory prediction in autonomous driving

Benjamin Stoler, Ingrid Navarro, Meghdeep Jana, Soonmin Hwang, Jonathan Francis, and Jean Oh. Safeshift: Safety- informed distribution shifts for robust trajectory prediction in autonomous driving. In2024 IEEE Intelligent Vehicles Symposium (IV), pages 1179–1186, 2024. 5

work page 2024

[34] [34]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 2, 5

work page 2017

[35] [35]

Jointmotion: Joint self-supervision for joint mo- tion prediction

Royden Wagner, Omer Sahin Tas, Marvin Klemp, and Carlos Fernandez. Jointmotion: Joint self-supervision for joint mo- tion prediction. In8th Annual Conference on Robot Learn- ing, 2024. 2

work page 2024

[36] [36]

Anomaly detection in multi- agent trajectories for automated driving

Julian Wiederer, Arij Bouazizi, Marco Troina, Ulrich Kres- sel, and Vasileios Belagiannis. Anomaly detection in multi- agent trajectories for automated driving. InConference on Robot Learning, 2021. 1

work page 2021

[37] [37]

Anomaly detection in multi- agent trajectories for automated driving

Julian Wiederer, Arij Bouazizi, Marco Troina, Ulrich Kres- sel, and Vasileios Belagiannis. Anomaly detection in multi- agent trajectories for automated driving. InProceedings of the 5th Conference on Robot Learning, pages 1223–1233. PMLR, 2022. 2, 4

work page 2022

[38] [38]

Joint out-of-distribution detection and uncertainty estimation for trajectory predic- tion

Julian Wiederer, Julian Schmidt, Ulrich Kressel, Klaus Diet- mayer, and Vasileios Belagiannis. Joint out-of-distribution detection and uncertainty estimation for trajectory predic- tion. In2023 IEEE/RSJ International Conference on Intelli- gent Robots and Sytems (IROS), 2023. 2, 4, 5

work page 2023

[39] [39]

Argoverse 2: Next generation datasets for self-driving perception and fore- casting, 2023

Benjamin Wilson, William Qi, Tanmay Agarwal, John Lam- bert, Jagjeet Singh, Siddhesh Khandelwal, Bowen Pan, Rat- nesh Kumar, Andrew Hartnett, Jhony Kaesemodel Pontes, Deva Ramanan, Peter Carr, and James Hays. Argoverse 2: Next generation datasets for self-driving perception and fore- casting, 2023. 2

work page 2023

[40] [40]

Improving out-of-distribution generalization of trajectory prediction for autonomous driv- ing via polynomial representations

Yue Yao, Shengchao Yan, Daniel Goehring, Wolfram Bur- gard, and Joerg Reichardt. Improving out-of-distribution generalization of trajectory prediction for autonomous driv- ing via polynomial representations. In2024 IEEE/RSJ In- ternational Conference on Intelligent Robots and Systems (IROS), pages 488–495, 2024. 2, 4

work page 2024

[41] [41]

Agents-llm: Augmenta- tive generation of challenging traffic scenarios with an agen- tic llm framework

Yu Yao, Salil Bhatnagar, Markus Mazzola, Vasileios Belagiannis, Igor Gilitschenski, Luigi Palmieri, Simon Razniewski, and Marcel Hallgarten. Agents-llm: Augmenta- tive generation of challenging traffic scenarios with an agen- tic llm framework. In2025 IEEE/RSJ International Con- ference on Intelligent Robots and Systems (IROS), pages 18400–18407, 2025. 2

work page 2025

[42] [42]

INTERACTION dataset: An international, adversarial and cooperative motion dataset in interactive driving scenarios with semantic maps,

Wei Zhan, Liting Sun, Di Wang, Haojie Shi, Aubrey Clausse, Maximilian Naumann, Julius K ¨ummerle, Hendrik K¨onigshof, Christoph Stiller, Arnaud de La Fortelle, and Masayoshi Tomizuka. INTERACTION Dataset: An IN- TERnational, Adversarial and Cooperative moTION Dataset in Interactive Driving Scenarios with Semantic Maps. arXiv:1910.03088 [cs, eess], 2019. 2

work page arXiv 1910

[43] [43]

Gradient rectification for robust calibration under distribu- tion shift.arXiv preprint arXiv:2508.19830, 2025

Yilin Zhang, Cai Xu, You Wu, Ziyu Guan, and Wei Zhao. Gradient rectification for robust calibration under distribu- tion shift.arXiv preprint arXiv:2508.19830, 2025. 2

work page arXiv 2025

[44] [44]

Hivt: Hierarchical vector transformer for multi-agent motion prediction

Zikang Zhou, Luyao Ye, Jianping Wang, Kui Wu, and Ke- jie Lu. Hivt: Hierarchical vector transformer for multi-agent motion prediction. In2022 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 8813– 8823, 2022. 2, 4, 7

work page 2022