Forecasting the Past: Gradient-Based Distribution Shift Detection in Trajectory Prediction
Pith reviewed 2026-05-10 15:35 UTC · model grok-4.3
The pith
A gradient-based score from a post-hoc decoder trained to forecast the second half of trajectories detects distribution shifts without changing the original prediction model.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the L2 norm of the gradient of an auxiliary self-supervised forecasting loss with respect to the decoder's final layer provides an effective score for detecting distribution shifts in trajectory prediction tasks, achieving substantial improvements on benchmark datasets while ensuring no interference with the original model's performance.
What carries the argument
The L2 norm of the gradient of the auxiliary forecasting loss with respect to the decoder's final layer, which acts as a distribution shift score.
Load-bearing premise
The L2 norm of the gradient of the auxiliary forecasting loss reliably indicates distribution shifts that matter for the downstream trajectory prediction task.
What would settle it
If experiments on the Shifts or Argoverse datasets show that the proposed gradient norm score does not outperform existing distribution shift detection methods in terms of detection accuracy or AUROC, the claim would be falsified.
Figures
read the original abstract
Trajectory prediction models often fail in real-world automated driving due to distributional shifts between training and test conditions. Such distributional shifts, whether behavioural or environmental, pose a critical risk by causing the model to make incorrect forecasts in unfamiliar situations. We propose a self-supervised method that trains a decoder in a post-hoc fashion on the self-supervised task of forecasting the second half of observed trajectories from the first half. The L2 norm of the gradient of this forecasting loss with respect to the decoder's final layer defines a score to identify distribution shifts. Our approach, first, does not affect the trajectory prediction model, ensuring no interference with original prediction performance and second, demonstrates substantial improvements on distribution shift detection for trajectory prediction on the Shifts and Argoverse datasets. Moreover, we show that this method can also be used to early detect collisions of a deep Q-Network motion planner in the Highway simulator. Source code is available at https://github.com/Michedev/forecasting-the-past.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a post-hoc self-supervised approach for distribution shift detection in trajectory prediction models. An auxiliary decoder is trained to forecast the second half of a trajectory from the first half; the L2 norm of the gradient of this auxiliary loss with respect to the decoder's final layer is used as a shift-detection score. The method is asserted to leave the original predictor untouched and to yield substantial improvements on the Shifts and Argoverse datasets, with an additional demonstration on early collision detection for a DQN planner in the Highway simulator.
Significance. If the auxiliary gradient norm reliably flags shifts that degrade the main predictor, the approach would supply a lightweight, non-intrusive monitoring tool for deployed trajectory models in autonomous driving. The post-hoc, self-supervised construction avoids any interference with original performance and re-uses existing trajectory data, which are practical advantages. Source-code release further supports reproducibility.
major comments (3)
- [Experiments] The central claim that the L2 gradient norm on the auxiliary forecasting loss detects shifts relevant to the original predictor rests on an untested alignment between auxiliary sensitivity and main-task failure modes. No analysis is provided showing correlation between high detection scores and elevated ADE/FDE on the untouched trajectory model under the reported shifts (Experiments section).
- [Section 4] Quantitative results for the claimed 'substantial improvements' on Shifts and Argoverse are not summarized with baselines, error bars, or ablation details on the auxiliary task and chosen gradient layer, making it impossible to assess whether the gains are robust or merely reflect the auxiliary decoder's own sensitivity (Section 4).
- [Application to DQN planner] The extension to early collision detection in the DQN Highway planner requires clarification on how the auxiliary decoder and gradient score are adapted to the planner's state representation and what quantitative metric defines 'early' detection (final application paragraph).
minor comments (2)
- [Abstract] Abstract: the phrasing 'first, does not affect... and second, demonstrates' is grammatically awkward and should be restructured for clarity.
- [Method] Notation for the auxiliary loss and the exact parameters θ (decoder final layer) should be defined explicitly in the method section to avoid ambiguity when readers reproduce the gradient computation.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and outline revisions to strengthen the presentation of our results and claims.
read point-by-point responses
-
Referee: [Experiments] The central claim that the L2 gradient norm on the auxiliary forecasting loss detects shifts relevant to the original predictor rests on an untested alignment between auxiliary sensitivity and main-task failure modes. No analysis is provided showing correlation between high detection scores and elevated ADE/FDE on the untouched trajectory model under the reported shifts (Experiments section).
Authors: We agree that an explicit analysis correlating the auxiliary gradient-norm scores with degradation in the original predictor's ADE/FDE would more directly substantiate the relevance of the detected shifts. The current validation relies on improved AUROC for shift detection on the Shifts and Argoverse benchmarks, where the shifts are known to impact trajectory prediction performance. To address this point, we will add a new analysis (including scatter plots or correlation coefficients) in the Experiments section of the revised manuscript that quantifies the relationship between high detection scores and elevated prediction errors on the frozen main model. revision: yes
-
Referee: [Section 4] Quantitative results for the claimed 'substantial improvements' on Shifts and Argoverse are not summarized with baselines, error bars, or ablation details on the auxiliary task and chosen gradient layer, making it impossible to assess whether the gains are robust or merely reflect the auxiliary decoder's own sensitivity (Section 4).
Authors: The results in Section 4 compare our method against multiple distribution-shift detection baselines and report AUROC improvements. However, we acknowledge that the presentation would benefit from error bars across random seeds and expanded ablations on auxiliary-task hyperparameters and gradient-layer selection. In the revision we will include these elements (error bars from at least five runs and additional ablation tables) to demonstrate robustness and rule out sensitivity artifacts. revision: yes
-
Referee: [Application to DQN planner] The extension to early collision detection in the DQN Highway planner requires clarification on how the auxiliary decoder and gradient score are adapted to the planner's state representation and what quantitative metric defines 'early' detection (final application paragraph).
Authors: The auxiliary decoder is trained on the same state trajectories used by the DQN planner (position, velocity, and heading sequences extracted from the simulator). The self-supervised forecasting task and gradient-norm computation are applied identically to the trajectory-prediction case. 'Early' detection is quantified by the number of timesteps before a collision at which the score exceeds a threshold, together with the resulting reduction in collision rate when the score triggers a safety intervention. We will expand the final paragraph with these details and add a short table of lead-time and collision-rate metrics in the revised manuscript. revision: partial
Circularity Check
No circularity: auxiliary gradient norm defined independently of main-task performance
full rationale
The paper defines its distribution-shift score directly as the L2 norm of the gradient of a post-hoc auxiliary self-supervised forecasting loss (second-half trajectory from first half) with respect to the decoder's final layer. This construction is independent of the original trajectory predictor's ADE/FDE or any fitted parameters from the main task. No equations reduce the score to a self-referential quantity, a fitted input renamed as prediction, or a self-citation chain. The method is explicitly post-hoc and non-interfering. Evaluation on Shifts and Argoverse is empirical comparison, not a derivation that collapses to the inputs by construction. This is the normal non-circular case.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Observed trajectories can be split into first and second halves that support meaningful self-supervised forecasting.
Reference graph
Works this paper leans on
-
[1]
Curb Your Attention: Causal Attention Gating for Robust Trajectory Prediction in Autonomous Driving,
Ehsan Ahmadi, Ray Mercurius, Soheil Alizadeh, Kasra Rezaee, and Amir Rasouli. Curb your attention: Causal at- tention gating for robust trajectory prediction in autonomous driving.arXiv preprint arXiv:2410.07191, 2024. 4
work page internal anchor Pith review arXiv 2024
-
[2]
Adapt: Efficient multi-agent trajectory prediction with adaptation
G ¨orkay Aydemir, Adil Kaan Akan, and Fatma G¨uney. Adapt: Efficient multi-agent trajectory prediction with adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 8295–8305, 2023. 2
work page 2023
-
[3]
Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Gi- ancarlo Baldan, and Oscar Beijbom. nuscenes: A multi- modal dataset for autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 2
work page 2020
-
[4]
Livingston McPherson, and Kather- ine Driggs-Campbell
Neeloy Chakraborty, Aamir Hasan, Shuijing Liu, Tianchen Ji, Weihang Liang, D. Livingston McPherson, and Kather- ine Driggs-Campbell. Structural attention-based recurrent variational autoencoder for highway vehicle anomaly de- tection. InProceedings of the 2023 International Confer- ence on Autonomous Agents and Multiagent Systems, page 1125–1134, Richland...
work page 2023
-
[5]
Argo- verse: 3d tracking and forecasting with rich maps
Ming-Fang Chang, John Lambert, Patsorn Sangkloy, Jag- jeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, and James Hays. Argo- verse: 3d tracking and forecasting with rich maps. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 2
work page 2019
-
[6]
Argov- erse: 3d tracking and forecasting with rich maps
Ming-Fang Chang, John W Lambert, Patsorn Sangkloy, Jag- jeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, and James Hays. Argov- erse: 3d tracking and forecasting with rich maps. InConfer- ence on Computer Vision and Pattern Recognition (CVPR),
-
[7]
S2tnet: Spatio-temporal transformer networks for trajectory predic- tion in autonomous driving
Weihuang Chen, Fangfang Wang, and Hongbin Sun. S2tnet: Spatio-temporal transformer networks for trajectory predic- tion in autonomous driving. InProceedings of The 13th Asian Conference on Machine Learning, pages 454–469. PMLR, 2021. 2
work page 2021
-
[8]
Yen-Chi Chen. A tutorial on kernel density estimation and recent advances.Biostatistics & Epidemiology, 1(1):161– 187, 2017. 7
work page 2017
-
[9]
Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks
Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and An- drew Rabinovich. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. InIn- ternational conference on machine learning, pages 794–803. PMLR, 2018. 3
work page 2018
-
[10]
End-to-end driving via conditional imitation learning
Felipe Codevilla, Matthias M ¨uller, Antonio L ´opez, Vladlen Koltun, and Alexey Dosovitskiy. End-to-end driving via conditional imitation learning. In2018 IEEE international conference on robotics and automation (ICRA), pages 4693–
-
[11]
Grood: Gradient-aware out-of- distribution detection.Transactions on Machine Learning Research, 2024
Mostafa ElAraby, Sabyasachi Sahoo, Yann Pequignot, Paul Novello, and Liam Paull. Grood: Gradient-aware out-of- distribution detection.Transactions on Machine Learning Research, 2024. 4
work page 2024
-
[12]
Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Ben Sapp, Charles R. Qi, Yin Zhou, Zoey Yang, Aur ´elien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander Mc- Cauley, Jonathon Shlens, and Dragomir Anguelov. Large scale interactive motion forecasting for autonomous driv- ing: The waymo open motion...
work page 2021
-
[13]
Unitraj: A unified framework for scalable vehi- cle trajectory prediction
Lan Feng, Mohammadhossein Bahari, Kaouther Mes- saoud Ben Amor, ´Eloi Zablocki, Matthieu Cord, and Alexan- dre Alahi. Unitraj: A unified framework for scalable vehi- cle trajectory prediction. InComputer Vision – ECCV 2024, pages 106–123, Cham, 2025. Springer Nature Switzerland. 2
work page 2024
-
[14]
Angelos Filos, Panagiotis Tigas, Rowan McAllister, Nicholas Rhinehart, Sergey Levine, and Yarin Gal. Can au- tonomous vehicles identify, recover from, and adapt to dis- tribution shifts? InInternational Conference on Machine Learning (ICML), 2020. 2
work page 2020
-
[15]
Dropout as a bayesian approximation: Representing model uncertainty in deep learning
Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. Ininternational conference on machine learning, pages 1050–1059. PMLR, 2016. 2
work page 2016
-
[16]
Multi- transmotion: Pre-trained model for human motion predic- tion
Yang Gao, Po-Chien Luan, and Alexandre Alahi. Multi- transmotion: Pre-trained model for human motion predic- tion. In8th Annual Conference on Robot Learning, 2024. 1
work page 2024
-
[17]
Uncertainty-aware likelihood ratio estimation for pixel- wise out-of-distribution detection
Marc H ¨olle, Walter Kellermann, and Vasileios Belagian- nis. Uncertainty-aware likelihood ratio estimation for pixel- wise out-of-distribution detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 772–782, 2025. 2
work page 2025
-
[18]
Heatmap- based out-of-distribution detection
Julia Hornauer and Vasileios Belagiannis. Heatmap- based out-of-distribution detection. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2603–2612, 2023. 1
work page 2023
-
[19]
Rui Huang, Andrew Geng, and Yixuan Li. On the impor- tance of gradients for detecting distributional shifts in the wild.Advances in Neural Information Processing Systems, 34:677–689, 2021. 2, 3
work page 2021
-
[20]
Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estima- tion using deep ensembles.Advances in neural information processing systems, 30, 2017. 2
work page 2017
-
[21]
An environment for autonomous driving decision-making.https://github.com/eleurent/ highway-env, 2018
Edouard Leurent. An environment for autonomous driving decision-making.https://github.com/eleurent/ highway-env, 2018. 7
work page 2018
-
[22]
Difftad: Denoising diffusion prob- abilistic models for vehicle trajectory anomaly detection
Chaoneng Li, Guanwen Feng, Yunan Li, Ruyi Liu, Qiguang Miao, and Liang Chang. Difftad: Denoising diffusion prob- abilistic models for vehicle trajectory anomaly detection. Knowledge-Based Systems, 286:111387, 2024. 2
work page 2024
-
[23]
Shiyu Liang, Yixuan Li, and R. Srikant. Enhancing the re- liability of out-of-distribution image detection in neural net- works. InInternational Conference on Learning Represen- tations, 2018. 2
work page 2018
-
[24]
Dyttp: Trajectory predic- tion with normalization-free transformers, 2025
Yunxiang Liu and Hongkuo Niu. Dyttp: Trajectory predic- tion with normalization-free transformers, 2025. 2
work page 2025
-
[25]
Andrey Malinin, Neil Band, German Chesnokov, Yarin Gal, Mark JF Gales, Alexey Noskov, Andrey Ploskonosov, Li- udmila Prokhorenkova, Ivan Provilkov, Vatsal Raina, et al. Shifts: A dataset of real distributional shift across multiple large-scale tasks.arXiv preprint arXiv:2107.07455, 2021. 2, 4, 5, 6, 7, 8
-
[26]
Sajad Marvi, Christoph Rist, Julian Schmidt, Julian Jor- dan, and Abhinav Valada. Evidential uncertainty estima- tion for multi-modal trajectory prediction.arXiv preprint arXiv:2503.05274, 2025. 2
-
[27]
Armin Danesh Pazho, Ghazal Alinezhad Noghre, Vinit Katariya, and Hamed Tabkhi. Vt-former: An exploratory study on vehicle trajectory prediction for highway surveil- lance through graph isomorphism and transformer. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 5651– 5662, 2024. 2
work page 2024
-
[28]
Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kan- ervisto, Maximilian Ernestus, and Noah Dormann. Stable- baselines3: Reliable reinforcement learning implementa- tions.Journal of Machine Learning Research, 22(268):1–8,
-
[29]
Deep imitative models for flexible inference, planning, and control
Nicholas Rhinehart, Rowan McAllister, and Sergey Levine. Deep imitative models for flexible inference, planning, and control. InInternational Conference on Learning Represen- tations, 2020. 2, 4, 5
work page 2020
-
[30]
Learning representations by back-propagating er- rors.nature, 323(6088):533–536, 1986
David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by back-propagating er- rors.nature, 323(6088):533–536, 1986. 4
work page 1986
-
[31]
Meat: Maneuver extraction from agent trajectories
Julian Schmidt, Julian Jordan, David Raba, Tobias Welz, and Klaus Dietmayer. Meat: Maneuver extraction from agent trajectories. In2022 IEEE Intelligent Vehicles Symposium (IV), pages 1810–1816. IEEE, 2022. 4
work page 2022
-
[32]
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Rad- ford, and Oleg Klimov. Proximal policy optimization algo- rithms.arXiv preprint arXiv:1707.06347, 2017. 7, 8
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[33]
Benjamin Stoler, Ingrid Navarro, Meghdeep Jana, Soonmin Hwang, Jonathan Francis, and Jean Oh. Safeshift: Safety- informed distribution shifts for robust trajectory prediction in autonomous driving. In2024 IEEE Intelligent Vehicles Symposium (IV), pages 1179–1186, 2024. 5
work page 2024
-
[34]
Attention is all you need.Advances in neural information processing systems, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 2, 5
work page 2017
-
[35]
Jointmotion: Joint self-supervision for joint mo- tion prediction
Royden Wagner, Omer Sahin Tas, Marvin Klemp, and Carlos Fernandez. Jointmotion: Joint self-supervision for joint mo- tion prediction. In8th Annual Conference on Robot Learn- ing, 2024. 2
work page 2024
-
[36]
Anomaly detection in multi- agent trajectories for automated driving
Julian Wiederer, Arij Bouazizi, Marco Troina, Ulrich Kres- sel, and Vasileios Belagiannis. Anomaly detection in multi- agent trajectories for automated driving. InConference on Robot Learning, 2021. 1
work page 2021
-
[37]
Anomaly detection in multi- agent trajectories for automated driving
Julian Wiederer, Arij Bouazizi, Marco Troina, Ulrich Kres- sel, and Vasileios Belagiannis. Anomaly detection in multi- agent trajectories for automated driving. InProceedings of the 5th Conference on Robot Learning, pages 1223–1233. PMLR, 2022. 2, 4
work page 2022
-
[38]
Joint out-of-distribution detection and uncertainty estimation for trajectory predic- tion
Julian Wiederer, Julian Schmidt, Ulrich Kressel, Klaus Diet- mayer, and Vasileios Belagiannis. Joint out-of-distribution detection and uncertainty estimation for trajectory predic- tion. In2023 IEEE/RSJ International Conference on Intelli- gent Robots and Sytems (IROS), 2023. 2, 4, 5
work page 2023
-
[39]
Argoverse 2: Next generation datasets for self-driving perception and fore- casting, 2023
Benjamin Wilson, William Qi, Tanmay Agarwal, John Lam- bert, Jagjeet Singh, Siddhesh Khandelwal, Bowen Pan, Rat- nesh Kumar, Andrew Hartnett, Jhony Kaesemodel Pontes, Deva Ramanan, Peter Carr, and James Hays. Argoverse 2: Next generation datasets for self-driving perception and fore- casting, 2023. 2
work page 2023
-
[40]
Yue Yao, Shengchao Yan, Daniel Goehring, Wolfram Bur- gard, and Joerg Reichardt. Improving out-of-distribution generalization of trajectory prediction for autonomous driv- ing via polynomial representations. In2024 IEEE/RSJ In- ternational Conference on Intelligent Robots and Systems (IROS), pages 488–495, 2024. 2, 4
work page 2024
-
[41]
Yu Yao, Salil Bhatnagar, Markus Mazzola, Vasileios Belagiannis, Igor Gilitschenski, Luigi Palmieri, Simon Razniewski, and Marcel Hallgarten. Agents-llm: Augmenta- tive generation of challenging traffic scenarios with an agen- tic llm framework. In2025 IEEE/RSJ International Con- ference on Intelligent Robots and Systems (IROS), pages 18400–18407, 2025. 2
work page 2025
-
[42]
Wei Zhan, Liting Sun, Di Wang, Haojie Shi, Aubrey Clausse, Maximilian Naumann, Julius K ¨ummerle, Hendrik K¨onigshof, Christoph Stiller, Arnaud de La Fortelle, and Masayoshi Tomizuka. INTERACTION Dataset: An IN- TERnational, Adversarial and Cooperative moTION Dataset in Interactive Driving Scenarios with Semantic Maps. arXiv:1910.03088 [cs, eess], 2019. 2
-
[43]
Yilin Zhang, Cai Xu, You Wu, Ziyu Guan, and Wei Zhao. Gradient rectification for robust calibration under distribu- tion shift.arXiv preprint arXiv:2508.19830, 2025. 2
-
[44]
Hivt: Hierarchical vector transformer for multi-agent motion prediction
Zikang Zhou, Luyao Ye, Jianping Wang, Kui Wu, and Ke- jie Lu. Hivt: Hierarchical vector transformer for multi-agent motion prediction. In2022 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 8813– 8823, 2022. 2, 4, 7
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.