VAGNet: Vision-based Accident Anticipation with Global Features

Charith D. Chitraranjan; Vipooshan Vipulananthan

arxiv: 2604.09305 · v3 · submitted 2026-04-10 · 💻 cs.CV

VAGNet: Vision-based Accident Anticipation with Global Features

Vipooshan Vipulananthan , Charith D. Chitraranjan This is my paper

Pith reviewed 2026-05-10 17:43 UTC · model grok-4.3

classification 💻 cs.CV

keywords accident anticipationdashcam videoglobal featuresVideoMAE-V2transformergraph modulestraffic safetyvision prediction

0 comments

The pith

Global features from dashcam video let VAGNet anticipate traffic accidents more accurately and with less computation than object-tracking methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes VAGNet, a network that predicts upcoming accidents in dashcam footage by processing global scene features instead of detecting and tracking individual objects. It combines the VideoMAE-V2 foundation model for feature extraction with transformer and graph modules to model scene dynamics. Experiments across the DAD, DoTA, DADA, and Nexar datasets show higher average precision and longer mean time-to-accident while running more efficiently than prior approaches. Readers would care because this design could support faster, real-time alerts in driver assistance systems without heavy onboard processing.

Core claim

VAGNet is a deep neural network that anticipates accidents from dash-cam video by using global features of traffic scenes extracted with VideoMAE-V2, processed through transformer and graph modules, without any explicit object-level features or tracking. This yields higher average precision and mean time-to-accident on the DAD, DoTA, DADA, and Nexar benchmarks while remaining computationally lighter than existing methods that rely on per-object processing.

What carries the argument

VAGNet architecture of transformer and graph modules that process global features extracted by VideoMAE-V2 from entire traffic scenes to predict accidents.

If this is right

Real-time accident anticipation becomes practical for advanced driver assistance systems.
Higher average precision and longer mean time-to-accident provide earlier intervention opportunities.
Lower computational requirements allow deployment without dedicated object detection hardware.
The approach generalizes across the four tested benchmark datasets of varying complexity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Scene-level context may prove sufficient for other safety-related video tasks, simplifying pipelines that currently depend on object detection.
Combining this global-feature strategy with additional foundation models could improve performance in challenging conditions like night or rain.
Reduced compute needs open the possibility of running such anticipation on lower-power vehicle hardware.

Load-bearing premise

Global features from VideoMAE-V2 contain enough information to anticipate accidents accurately without needing explicit details from individual objects or their interactions.

What would settle it

Evaluating VAGNet on an additional real-world driving dataset where it shows lower average precision, shorter mean time-to-accident, or higher computational cost than object-based baselines.

Figures

Figures reproduced from arXiv: 2604.09305 by Charith D. Chitraranjan, Vipooshan Vipulananthan.

**Figure 2.** Figure 2: Distribution of crash-object categories in the DoTA [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Samples showing the variation of predicted accident probability across time/ [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Samples of visual explanations for accidents across different weather/lighting conditions. For each condition, the top row shows the original video frames in temporal order, and the bottom row shows the corresponding Grad-CAM [45] maps highlighting areas in the image that contribute to accident anticipation. Higher intensity indicates regions that have a stronger influence on the model’s anticipation of a … view at source ↗

read the original abstract

Traffic accidents are a leading cause of fatalities and injuries across the globe. Therefore, the ability to anticipate hazardous situations in advance is essential. Automated accident anticipation enables timely intervention through driver alerts and collision avoidance maneuvers, forming a key component of advanced driver assistance systems. In autonomous driving, such predictive capabilities support proactive safety behaviors, such as initiating defensive driving and human takeover when required. Using dashcam video as input offers a cost-effective solution, but it is challenging due to the complexity of real-world driving scenes. Accident anticipation systems need to operate in real-time. However, current methods involve extracting features from each detected object, which is computationally intensive. We propose VAGNet, a deep neural network that learns to predict accidents from dash-cam video using global features of traffic scenes without requiring explicit object-level features. The network consists of transformer and graph modules, and we use the vision foundation model VideoMAE-V2 for global feature extraction. Experiments on four benchmark datasets (DAD, DoTA, DADA, and Nexar) show that our method anticipates accidents with higher average precision and mean time-to-accident while being computationally more efficient compared to existing methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

VAGNet shows global VideoMAE-V2 features plus transformers and graphs can beat object-based baselines on accident anticipation benchmarks while using less compute, but the results rest on whether those features actually resolve the local motions that matter.

read the letter

The main takeaway is that this paper gets accident anticipation working from global scene features alone, skipping the usual object detection and tracking steps that slow things down in dashcam video. It reports higher average precision and mean time-to-accident on DAD, DoTA, DADA, and Nexar, with a clear efficiency advantage from the pretrained VideoMAE-V2 backbone and the added transformer-graph layers.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes VAGNet, a deep neural network for anticipating traffic accidents from dashcam video. It extracts global scene features using the pretrained VideoMAE-V2 vision foundation model, processes them via transformer and graph modules, and predicts accidents without any explicit object detection, tracking, or local feature extraction. Experiments on the DAD, DoTA, DADA, and Nexar benchmarks are reported to yield higher average precision and mean time-to-accident than prior methods while also improving computational efficiency.

Significance. If the central empirical claims hold under rigorous validation, the work would be significant for real-time autonomous driving and ADAS applications. By showing that global-only representations can outperform object-centric pipelines, it could reduce the computational cost of accident anticipation and simplify deployment on edge devices. The approach also demonstrates effective transfer of large-scale pretrained video models to safety-critical prediction tasks.

major comments (3)

[Method (§3)] The central claim that global VideoMAE-V2 features suffice without object-level cues is load-bearing, yet the architecture description does not include an ablation that isolates the contribution of the global-only design (e.g., a variant with added object detections or local patch features). Without this, it is impossible to determine whether reported gains stem from the global-feature hypothesis or from other modeling choices.
[Experiments (§4)] The performance claims (higher AP and mTTA on four datasets) are presented without reported details on experimental protocol: number of random seeds, statistical significance tests, variance across runs, or exact baseline re-implementations. This gap directly affects verifiability of the efficiency and accuracy improvements asserted in the abstract.
[Results and Discussion (§5)] No qualitative analysis or failure-case examination is provided to test whether global patch embeddings preserve the fine-grained relative motions (e.g., sudden cut-ins or pedestrian incursions) that drive many accidents in DAD/DoTA. Such analysis is required to address the risk that performance reflects dataset correlations rather than true anticipation capability.

minor comments (2)

[Abstract] The abstract states performance improvements but omits numerical deltas or efficiency metrics (e.g., FPS or FLOPs); adding these would improve clarity.
[Method (§3.2)] Notation for the graph module and transformer integration could be made more explicit (e.g., defining the adjacency matrix construction) to aid reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. We address each major comment below and commit to revisions that strengthen the manuscript's rigor and verifiability.

read point-by-point responses

Referee: [Method (§3)] The central claim that global VideoMAE-V2 features suffice without object-level cues is load-bearing, yet the architecture description does not include an ablation that isolates the contribution of the global-only design (e.g., a variant with added object detections or local patch features). Without this, it is impossible to determine whether reported gains stem from the global-feature hypothesis or from other modeling choices.

Authors: We agree that an explicit ablation isolating the global-only design is necessary to substantiate the central hypothesis. In the revised manuscript we will add a controlled ablation study that compares the full VAGNet model against variants augmented with object detections (using an off-the-shelf detector) and with local patch features, while keeping all other components fixed. This will clarify whether the reported gains derive primarily from the global VideoMAE-V2 representation. revision: yes
Referee: [Experiments (§4)] The performance claims (higher AP and mTTA on four datasets) are presented without reported details on experimental protocol: number of random seeds, statistical significance tests, variance across runs, or exact baseline re-implementations. This gap directly affects verifiability of the efficiency and accuracy improvements asserted in the abstract.

Authors: We acknowledge that the current experimental section lacks sufficient protocol details for full reproducibility. In the revision we will report: (i) the number of random seeds used for training and evaluation, (ii) standard deviations across runs, (iii) results of statistical significance tests (e.g., paired t-tests against baselines), and (iv) precise descriptions of how each baseline was re-implemented, including any hyper-parameter choices and hardware settings. revision: yes
Referee: [Results and Discussion (§5)] No qualitative analysis or failure-case examination is provided to test whether global patch embeddings preserve the fine-grained relative motions (e.g., sudden cut-ins or pedestrian incursions) that drive many accidents in DAD/DoTA. Such analysis is required to address the risk that performance reflects dataset correlations rather than true anticipation capability.

Authors: We concur that qualitative evidence is required to demonstrate that global embeddings capture the critical fine-grained motions. We will add a new subsection containing: (a) attention-map visualizations on representative DAD and DoTA sequences highlighting sudden cut-ins and pedestrian incursions, and (b) a failure-case analysis that categorizes errors and discusses whether they stem from limitations of global features versus other factors. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical architecture proposal and benchmark evaluation

full rationale

The paper proposes VAGNet as a DNN architecture that extracts global features via the external pretrained VideoMAE-V2 model, then processes them with transformer and graph modules to anticipate accidents from dashcam video. Performance is asserted solely via direct experiments on four independent external benchmark datasets (DAD, DoTA, DADA, Nexar), reporting higher AP, mTTA, and efficiency versus prior methods. No equations, derivations, fitted-parameter predictions, or self-citation chains appear in the abstract or description; the central claim does not reduce to its inputs by construction and remains falsifiable against the cited benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim depends on the pre-trained VideoMAE-V2 model providing adequate global features and on the new network architecture being effective for this task; no invented entities are introduced.

free parameters (1)

Neural network hyperparameters and weights
Standard in deep learning; model parameters are optimized on the training splits of the benchmark datasets.

axioms (1)

domain assumption Global features from VideoMAE-V2 capture sufficient information for accident anticipation without object-level details
The method explicitly avoids explicit object detection and relies on this premise for both accuracy and efficiency.

pith-pipeline@v0.9.0 · 5506 in / 1507 out tokens · 104484 ms · 2026-05-10T17:43:49.583003+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose VAGNet, a deep neural network that learns to predict accidents from dash-cam video using global features of traffic scenes without requiring explicit object-level features. The network consists of transformer and graph modules, and we use the vision foundation model VideoMAE-V2 for global feature extraction.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

A Graph Transformer layer is then applied to process the global frame-level features

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 2 internal anchors

[1]

int/news-room/fact-sheets/detail/ road-traffic-injuries, 2023

Road traffic injuries.https://www.who. int/news-room/fact-sheets/detail/ road-traffic-injuries, 2023. [Online; accessed 24-December-2025]

work page 2023
[2]

Unobserved heterogeneity and the statistical analysis of highway accident data.Analytic methods in accident re- search, 11:1–16, 2016

Fred L Mannering, Venky Shankar, and Chandra R Bhat. Unobserved heterogeneity and the statistical analysis of highway accident data.Analytic methods in accident re- search, 11:1–16, 2016

work page 2016
[3]

To- ward explainable artificial intelligence for early anticipa- tion of traffic accidents.Transportation research record, 2676(6):743–755, 2022

Muhammad Monjurul Karim, Yu Li, and Ruwen Qin. To- ward explainable artificial intelligence for early anticipa- tion of traffic accidents.Transportation research record, 2676(6):743–755, 2022

work page 2022
[4]

Intel- ligent defensive driving for autonomous vehicles: Frame- work, strategy and verification.Accident Analysis&Pre- vention, 226:108355, 2026

Ting Zhang, Zixuan Wang, Hong Wang, and Jun Li. Intel- ligent defensive driving for autonomous vehicles: Frame- work, strategy and verification.Accident Analysis&Pre- vention, 226:108355, 2026. 10

work page 2026
[5]

Driver models for the definition of safety requirements of automated vehicles in interna- tional regulations

Konstantinos Mattas, Giovanni Albano, Riccardo Donà, Maria Christina Galassi, Ricardo Suarez-Bertoa, Sandor Vass, and Biagio Ciuffo. Driver models for the definition of safety requirements of automated vehicles in interna- tional regulations. application to motorway driving condi- tions.Accident Analysis&Prevention, 174:106743, 2022

work page 2022
[6]

Smpc-based motion planning of automated vehicle when interacting with occluded pedestrians.IEEE Transactions on Intelligent Transportation Systems, 2024

Daofei Li, Yangye Jiang, Jiajie Zhang, and Bin Xiao. Smpc-based motion planning of automated vehicle when interacting with occluded pedestrians.IEEE Transactions on Intelligent Transportation Systems, 2024

work page 2024
[7]

Seeing before observable: Poten- tial risk reasoning in autonomous driving via vision lan- guage models.arXiv preprint arXiv:2511.22928, 2025

Jiaxin Liu, Xiangyu Yan, Liang Peng, Lei Yang, Lingjun Zhang, Yuechen Luo, Yueming Tao, Ashton Yu Xuan Tan, Mu Li, Lei Zhang, et al. Seeing before observable: Poten- tial risk reasoning in autonomous driving via vision lan- guage models.arXiv preprint arXiv:2511.22928, 2025

work page arXiv 2025
[8]

Ad- vancing explainable autonomous vehicle systems: A com- prehensive review and research roadmap.ACM Transac- tions on Human-Robot Interaction, 14(3):1–46, 2025

Sule Tekkesinoglu, Azra Habibovic, and Lars Kunze. Ad- vancing explainable autonomous vehicle systems: A com- prehensive review and research roadmap.ACM Transac- tions on Human-Robot Interaction, 14(3):1–46, 2025

work page 2025
[9]

Recent advance- ments in end-to-end autonomous driving using deep learn- ing: A survey.IEEE Transactions on Intelligent V ehicles, 9(1):103–118, 2023

Pranav Singh Chib and Pravendra Singh. Recent advance- ments in end-to-end autonomous driving using deep learn- ing: A survey.IEEE Transactions on Intelligent V ehicles, 9(1):103–118, 2023

work page 2023
[10]

Curse of rarity for au- tonomous vehicles.nature communications, 15(1):4808, 2024

Henry X Liu and Shuo Feng. Curse of rarity for au- tonomous vehicles.nature communications, 15(1):4808, 2024

work page 2024
[11]

Deep long-tailed learning: A survey

Yifan Zhang, Bingyi Kang, Bryan Hooi, Shuicheng Yan, and Jiashi Feng. Deep long-tailed learning: A survey. IEEE transactions on pattern analysis and machine in- telligence, 45(9):10795–10816, 2023

work page 2023
[12]

Nexar dash- cam collision prediction dataset and challenge

Daniel Moura, Shizhan Zhu, and Orly Zvitia. Nexar dash- cam collision prediction dataset and challenge. InPro- ceedings of the Computer Vision and Pattern Recognition Conference, pages 2583–2591, 2025

work page 2025
[13]

Dota: unsupervised de- tection of traffic anomaly in driving videos.IEEE transac- tions on pattern analysis and machine intelligence, 2022

Yu Yao, Xizi Wang, Mingze Xu, Zelin Pu, Yuchen Wang, Ella Atkins, and David Crandall. Dota: unsupervised de- tection of traffic anomaly in driving videos.IEEE transac- tions on pattern analysis and machine intelligence, 2022

work page 2022
[14]

Dada-2000: Can driving accident be predicted by driver attentionƒ analyzed by a bench- mark

Jianwu Fang, Dingxin Yan, Jiahuan Qiao, Jianru Xue, He Wang, and Sen Li. Dada-2000: Can driving accident be predicted by driver attentionƒ analyzed by a bench- mark. In2019 IEEE Intelligent Transportation Systems Conference (ITSC), pages 4303–4309. IEEE, 2019

work page 2000
[15]

Scene-graph augmented data-driven risk assessment of autonomous vehicle decisions.IEEE Transactions on Intelligent Transportation Systems, 23 (7):7941–7951, 2021

Shih-Yuan Yu, Arnav Vaibhav Malawade, Deepan Muthi- rayan, Pramod P Khargonekar, and Mohammad Abdul- lah Al Faruque. Scene-graph augmented data-driven risk assessment of autonomous vehicle decisions.IEEE Transactions on Intelligent Transportation Systems, 23 (7):7941–7951, 2021

work page 2021
[16]

[Online; accessed 27-October-2025]

Retrofit Collision Warning System Gives Older Vehi- cles A Safety Boost.https://trid.trb.org/View/ 1574810, 2018. [Online; accessed 27-October-2025]

work page 2018
[17]

Mobileye: The future of driverless cars

David B Yoffie. Mobileye: The future of driverless cars. Harvard Business School Case, pages 715–421, 2014

work page 2014
[18]

Vision-based traffic accident detection and anticipation: A survey.IEEE Transactions on Circuits and Systems for Video Technology, 2023

Jianwu Fang, Jiahuan Qiao, Jianru Xue, and Zhengguo Li. Vision-based traffic accident detection and anticipation: A survey.IEEE Transactions on Circuits and Systems for Video Technology, 2023

work page 2023
[19]

Ccaf-net: Cascade complementarity- aware fusion network for traffic accident prediction in dashcam videos.Neurocomputing, 624:129285, 2025

Wei Liu, Yafei Li, Tao Zhang, Yixiang Gao, Longsheng Wei, and Jun Chen. Ccaf-net: Cascade complementarity- aware fusion network for traffic accident prediction in dashcam videos.Neurocomputing, 624:129285, 2025

work page 2025
[20]

Early traffic accident anticipation via fea- ture consistency representation and soft label regression

Yuanhong Zhong, Ge Yan, Ruyue Zhu, Ping Gan, and Xuerui Shen. Early traffic accident anticipation via fea- ture consistency representation and soft label regression. ACM Transactions on Multimedia Computing, Communi- cations and Applications, 2025

work page 2025
[21]

Graph (graph): A nested graph-based framework for early accident anticipation

Nupur Thakur, PrasanthSai Gouripeddi, and Baoxin Li. Graph (graph): A nested graph-based framework for early accident anticipation. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 7533–7541, 2024

work page 2024
[22]

A dynamic spatial-temporal attention network for early anticipation of traffic accidents.IEEE Transactions on Intelligent Transportation Systems, 23 (7):9590–9600, 2022

Muhammad Monjurul Karim, Yu Li, Ruwen Qin, and Zhaozheng Yin. A dynamic spatial-temporal attention network for early anticipation of traffic accidents.IEEE Transactions on Intelligent Transportation Systems, 23 (7):9590–9600, 2022

work page 2022
[23]

Spatiotemporal scene- graph embedding for autonomous vehicle collision pre- diction.IEEE Internet of Things Journal, 9(12):9379– 9388, 2022

Arnav Vaibhav Malawade, Shih-Yuan Yu, Brandon Hsu, Deepan Muthirayan, Pramod P Khargonekar, and Mo- hammad Abdullah Al Faruque. Spatiotemporal scene- graph embedding for autonomous vehicle collision pre- diction.IEEE Internet of Things Journal, 9(12):9379– 9388, 2022

work page 2022
[24]

Drive: Deep rein- forced accident anticipation with visual explanation

Wentao Bao, Qi Yu, and Yu Kong. Drive: Deep rein- forced accident anticipation with visual explanation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7619–7628, 2021

work page 2021
[25]

Anticipating traffic accidents with adaptive loss and large-scale incident db

Tomoyuki Suzuki, Hirokatsu Kataoka, Yoshimitsu Aoki, and Yutaka Satoh. Anticipating traffic accidents with adaptive loss and large-scale incident db. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3521–3529, 2018

work page 2018
[26]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[27]

Stagnet: A spatio-temporal graph and lstm framework for accident anticipation.IEEE Access, 13:213769–213779,

Vipooshan Vipulananthan, Kumudu Mohottala, Kavindu Chinthana, Nimsara Paramulla, and Charith Chitraranjan. Stagnet: A spatio-temporal graph and lstm framework for accident anticipation.IEEE Access, 13:213769–213779,

work page
[28]

doi: 10.1109/ACCESS.2025.3645127

work page doi:10.1109/access.2025.3645127 2025
[29]

Videomae v2: Scaling video masked autoencoders with dual mask- ing

Limin Wang, Bingkun Huang, Zhiyu Zhao, Zhan Tong, Yinan He, Yi Wang, Yali Wang, and Yu Qiao. Videomae v2: Scaling video masked autoencoders with dual mask- ing. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 14549–14560, 2023

work page 2023
[30]

Quo vadis, action recognition? a new model and the kinetics dataset

Joao Carreira and Andrew Zisserman. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6299–6308, 2017

work page 2017
[31]

Slowfast networks for video recognition

Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. Slowfast networks for video recognition. 2019 IEEE/CVF International Conference on Computer 11 Vision (ICCV), pages 6201–6210, 2018. URLhttps:// api.semanticscholar.org/CorpusID:54463801

work page 2019
[32]

DINOv3

Oriane Siméoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[33]

Anticipating accidents in dashcam videos

Fu-Hsiang Chan, Yu-Ting Chen, Yu Xiang, and Min Sun. Anticipating accidents in dashcam videos. InComputer Vision–ACCV 2016: 13th Asian Conference on Computer Vision, Taipei, Taiwan, November 20-24, 2016, Revised Selected Papers, Part IV 13, pages 136–153. Springer, 2017

work page 2016
[34]

Uncertainty-based traf- fic accident anticipation with spatio-temporal relational learning

Wentao Bao, Qi Yu, and Yu Kong. Uncertainty-based traf- fic accident anticipation with spatio-temporal relational learning. InProceedings of the 28th ACM International Conference on Multimedia, pages 2682–2690, 2020

work page 2020
[35]

A new approach to traffic accident anticipation with geomet- ric features for better generalizability.IEEE Access, 11: 29263–29274, 2023

Farhan Mahmood, Daehyeon Jeong, and Jeha Ryu. A new approach to traffic accident anticipation with geomet- ric features for better generalizability.IEEE Access, 11: 29263–29274, 2023

work page 2023
[36]

Vision-based collision warning sys- tems with deep learning: A systematic review.Journal of Imaging, 11(2):64, 2025

Charith Chitraranjan, Vipooshan Vipulananthan, and Thu- varakan Sritharan. Vision-based collision warning sys- tems with deep learning: A systematic review.Journal of Imaging, 11(2):64, 2025

work page 2025
[37]

Faster r-cnn: Towards real-time object detection with re- gion proposal networks.Advances in neural information processing systems, 28, 2015

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with re- gion proposal networks.Advances in neural information processing systems, 28, 2015

work page 2015
[38]

Aat-da: Accident anticipa- tion transformer with driver attention

Yuto Kumamoto, Kento Ohtani, Daiki Suzuki, Minori Ya- mataka, and Kazuya Takeda. Aat-da: Accident anticipa- tion transformer with driver attention. InProceedings of the Winter Conference on Applications of Computer Vi- sion, pages 1142–1151, 2025

work page 2025
[39]

Gated driver attention predictor

Tianci Zhao, Xue Bai, Jianwu Fang, and Jianru Xue. Gated driver attention predictor. In2023 IEEE 26th In- ternational Conference on Intelligent Transportation Sys- tems (ITSC), pages 270–276. IEEE, 2023

work page 2023
[40]

Cognitive traf- fic accident anticipation.IEEE Intelligent Transportation Systems Magazine, 16(5):17–32, 2024

Lei-Lei Li, Jianwu Fang, and Jianru Xue. Cognitive traf- fic accident anticipation.IEEE Intelligent Transportation Systems Magazine, 16(5):17–32, 2024

work page 2024
[41]

Cognitive accident pre- diction in driving scenes: A multimodality benchmark

Jianwu Fang, Lei-Lei Li, Kuan Yang, Zhedong Zheng, Jianru Xue, and Tat-Seng Chua. Cognitive accident pre- diction in driving scenes: A multimodality benchmark. arXiv preprint arXiv:2212.09381, 2022

work page arXiv 2022
[42]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

work page 2017
[43]

Masked label prediction: Uni- fied message passing model for semi-supervised classi- fication, 2021

Yunsheng Shi, Zhengjie Huang, Shikun Feng, Hui Zhong, Wenjin Wang, and Yu Sun. Masked label prediction: Uni- fied message passing model for semi-supervised classi- fication, 2021. URLhttps://arxiv.org/abs/2009. 03509

work page 2021
[44]

The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets.PloS one, 10 (3):e0118432, 2015

Takaya Saito and Marc Rehmsmeier. The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets.PloS one, 10 (3):e0118432, 2015

work page 2015
[45]

Bdd100k: A diverse driving dataset for heteroge- neous multitask learning

Fisher Yu, Haofeng Chen, Xin Wang, Wenqi Xian, Yingy- ing Chen, Fangchen Liu, Vashisht Madhavan, and Trevor Darrell. Bdd100k: A diverse driving dataset for heteroge- neous multitask learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2636–2645, 2020

work page 2020
[46]

Grad-cam: Visual explanations from deep networks via gradient-based localization

Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Ba- tra. Grad-cam: Visual explanations from deep networks via gradient-based localization. InProceedings of the IEEE international conference on computer vision, pages 618–626, 2017

work page 2017
[47]

Gsc: A graph and spatio-temporal continuity based framework for accident anticipation.IEEE Transactions on Intelligent V ehicles, 9 (1):2249–2261, 2023

Tianhang Wang, Kai Chen, Guang Chen, Bin Li, Zhijun Li, Zhengfa Liu, and Changjun Jiang. Gsc: A graph and spatio-temporal continuity based framework for accident anticipation.IEEE Transactions on Intelligent V ehicles, 9 (1):2249–2261, 2023

work page 2023
[48]

Dynamic attention augmented graph net- work for video accident anticipation.Pattern Recognition, 147:110071, 2024

Wenfeng Song, Shuai Li, Tao Chang, Ke Xie, Aimin Hao, and Hong Qin. Dynamic attention augmented graph net- work for video accident anticipation.Pattern Recognition, 147:110071, 2024

work page 2024
[49]

Mobilenetv2: In- verted residuals and linear bottlenecks

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: In- verted residuals and linear bottlenecks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4510–4520, 2018

work page 2018
[50]

https://pypi.org/project/thop/, 2022

THOP: A tool to count the FLOPs of PyTorch model. https://pypi.org/project/thop/, 2022. [Online; accessed 14-November-2025]

work page 2022

[1] [1]

int/news-room/fact-sheets/detail/ road-traffic-injuries, 2023

Road traffic injuries.https://www.who. int/news-room/fact-sheets/detail/ road-traffic-injuries, 2023. [Online; accessed 24-December-2025]

work page 2023

[2] [2]

Unobserved heterogeneity and the statistical analysis of highway accident data.Analytic methods in accident re- search, 11:1–16, 2016

Fred L Mannering, Venky Shankar, and Chandra R Bhat. Unobserved heterogeneity and the statistical analysis of highway accident data.Analytic methods in accident re- search, 11:1–16, 2016

work page 2016

[3] [3]

To- ward explainable artificial intelligence for early anticipa- tion of traffic accidents.Transportation research record, 2676(6):743–755, 2022

Muhammad Monjurul Karim, Yu Li, and Ruwen Qin. To- ward explainable artificial intelligence for early anticipa- tion of traffic accidents.Transportation research record, 2676(6):743–755, 2022

work page 2022

[4] [4]

Intel- ligent defensive driving for autonomous vehicles: Frame- work, strategy and verification.Accident Analysis&Pre- vention, 226:108355, 2026

Ting Zhang, Zixuan Wang, Hong Wang, and Jun Li. Intel- ligent defensive driving for autonomous vehicles: Frame- work, strategy and verification.Accident Analysis&Pre- vention, 226:108355, 2026. 10

work page 2026

[5] [5]

Driver models for the definition of safety requirements of automated vehicles in interna- tional regulations

Konstantinos Mattas, Giovanni Albano, Riccardo Donà, Maria Christina Galassi, Ricardo Suarez-Bertoa, Sandor Vass, and Biagio Ciuffo. Driver models for the definition of safety requirements of automated vehicles in interna- tional regulations. application to motorway driving condi- tions.Accident Analysis&Prevention, 174:106743, 2022

work page 2022

[6] [6]

Smpc-based motion planning of automated vehicle when interacting with occluded pedestrians.IEEE Transactions on Intelligent Transportation Systems, 2024

Daofei Li, Yangye Jiang, Jiajie Zhang, and Bin Xiao. Smpc-based motion planning of automated vehicle when interacting with occluded pedestrians.IEEE Transactions on Intelligent Transportation Systems, 2024

work page 2024

[7] [7]

Seeing before observable: Poten- tial risk reasoning in autonomous driving via vision lan- guage models.arXiv preprint arXiv:2511.22928, 2025

Jiaxin Liu, Xiangyu Yan, Liang Peng, Lei Yang, Lingjun Zhang, Yuechen Luo, Yueming Tao, Ashton Yu Xuan Tan, Mu Li, Lei Zhang, et al. Seeing before observable: Poten- tial risk reasoning in autonomous driving via vision lan- guage models.arXiv preprint arXiv:2511.22928, 2025

work page arXiv 2025

[8] [8]

Ad- vancing explainable autonomous vehicle systems: A com- prehensive review and research roadmap.ACM Transac- tions on Human-Robot Interaction, 14(3):1–46, 2025

Sule Tekkesinoglu, Azra Habibovic, and Lars Kunze. Ad- vancing explainable autonomous vehicle systems: A com- prehensive review and research roadmap.ACM Transac- tions on Human-Robot Interaction, 14(3):1–46, 2025

work page 2025

[9] [9]

Recent advance- ments in end-to-end autonomous driving using deep learn- ing: A survey.IEEE Transactions on Intelligent V ehicles, 9(1):103–118, 2023

Pranav Singh Chib and Pravendra Singh. Recent advance- ments in end-to-end autonomous driving using deep learn- ing: A survey.IEEE Transactions on Intelligent V ehicles, 9(1):103–118, 2023

work page 2023

[10] [10]

Curse of rarity for au- tonomous vehicles.nature communications, 15(1):4808, 2024

Henry X Liu and Shuo Feng. Curse of rarity for au- tonomous vehicles.nature communications, 15(1):4808, 2024

work page 2024

[11] [11]

Deep long-tailed learning: A survey

Yifan Zhang, Bingyi Kang, Bryan Hooi, Shuicheng Yan, and Jiashi Feng. Deep long-tailed learning: A survey. IEEE transactions on pattern analysis and machine in- telligence, 45(9):10795–10816, 2023

work page 2023

[12] [12]

Nexar dash- cam collision prediction dataset and challenge

Daniel Moura, Shizhan Zhu, and Orly Zvitia. Nexar dash- cam collision prediction dataset and challenge. InPro- ceedings of the Computer Vision and Pattern Recognition Conference, pages 2583–2591, 2025

work page 2025

[13] [13]

Dota: unsupervised de- tection of traffic anomaly in driving videos.IEEE transac- tions on pattern analysis and machine intelligence, 2022

Yu Yao, Xizi Wang, Mingze Xu, Zelin Pu, Yuchen Wang, Ella Atkins, and David Crandall. Dota: unsupervised de- tection of traffic anomaly in driving videos.IEEE transac- tions on pattern analysis and machine intelligence, 2022

work page 2022

[14] [14]

Dada-2000: Can driving accident be predicted by driver attentionƒ analyzed by a bench- mark

Jianwu Fang, Dingxin Yan, Jiahuan Qiao, Jianru Xue, He Wang, and Sen Li. Dada-2000: Can driving accident be predicted by driver attentionƒ analyzed by a bench- mark. In2019 IEEE Intelligent Transportation Systems Conference (ITSC), pages 4303–4309. IEEE, 2019

work page 2000

[15] [15]

Scene-graph augmented data-driven risk assessment of autonomous vehicle decisions.IEEE Transactions on Intelligent Transportation Systems, 23 (7):7941–7951, 2021

Shih-Yuan Yu, Arnav Vaibhav Malawade, Deepan Muthi- rayan, Pramod P Khargonekar, and Mohammad Abdul- lah Al Faruque. Scene-graph augmented data-driven risk assessment of autonomous vehicle decisions.IEEE Transactions on Intelligent Transportation Systems, 23 (7):7941–7951, 2021

work page 2021

[16] [16]

[Online; accessed 27-October-2025]

Retrofit Collision Warning System Gives Older Vehi- cles A Safety Boost.https://trid.trb.org/View/ 1574810, 2018. [Online; accessed 27-October-2025]

work page 2018

[17] [17]

Mobileye: The future of driverless cars

David B Yoffie. Mobileye: The future of driverless cars. Harvard Business School Case, pages 715–421, 2014

work page 2014

[18] [18]

Vision-based traffic accident detection and anticipation: A survey.IEEE Transactions on Circuits and Systems for Video Technology, 2023

Jianwu Fang, Jiahuan Qiao, Jianru Xue, and Zhengguo Li. Vision-based traffic accident detection and anticipation: A survey.IEEE Transactions on Circuits and Systems for Video Technology, 2023

work page 2023

[19] [19]

Ccaf-net: Cascade complementarity- aware fusion network for traffic accident prediction in dashcam videos.Neurocomputing, 624:129285, 2025

Wei Liu, Yafei Li, Tao Zhang, Yixiang Gao, Longsheng Wei, and Jun Chen. Ccaf-net: Cascade complementarity- aware fusion network for traffic accident prediction in dashcam videos.Neurocomputing, 624:129285, 2025

work page 2025

[20] [20]

Early traffic accident anticipation via fea- ture consistency representation and soft label regression

Yuanhong Zhong, Ge Yan, Ruyue Zhu, Ping Gan, and Xuerui Shen. Early traffic accident anticipation via fea- ture consistency representation and soft label regression. ACM Transactions on Multimedia Computing, Communi- cations and Applications, 2025

work page 2025

[21] [21]

Graph (graph): A nested graph-based framework for early accident anticipation

Nupur Thakur, PrasanthSai Gouripeddi, and Baoxin Li. Graph (graph): A nested graph-based framework for early accident anticipation. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 7533–7541, 2024

work page 2024

[22] [22]

A dynamic spatial-temporal attention network for early anticipation of traffic accidents.IEEE Transactions on Intelligent Transportation Systems, 23 (7):9590–9600, 2022

Muhammad Monjurul Karim, Yu Li, Ruwen Qin, and Zhaozheng Yin. A dynamic spatial-temporal attention network for early anticipation of traffic accidents.IEEE Transactions on Intelligent Transportation Systems, 23 (7):9590–9600, 2022

work page 2022

[23] [23]

Spatiotemporal scene- graph embedding for autonomous vehicle collision pre- diction.IEEE Internet of Things Journal, 9(12):9379– 9388, 2022

Arnav Vaibhav Malawade, Shih-Yuan Yu, Brandon Hsu, Deepan Muthirayan, Pramod P Khargonekar, and Mo- hammad Abdullah Al Faruque. Spatiotemporal scene- graph embedding for autonomous vehicle collision pre- diction.IEEE Internet of Things Journal, 9(12):9379– 9388, 2022

work page 2022

[24] [24]

Drive: Deep rein- forced accident anticipation with visual explanation

Wentao Bao, Qi Yu, and Yu Kong. Drive: Deep rein- forced accident anticipation with visual explanation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7619–7628, 2021

work page 2021

[25] [25]

Anticipating traffic accidents with adaptive loss and large-scale incident db

Tomoyuki Suzuki, Hirokatsu Kataoka, Yoshimitsu Aoki, and Yutaka Satoh. Anticipating traffic accidents with adaptive loss and large-scale incident db. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3521–3529, 2018

work page 2018

[26] [26]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[27] [27]

Stagnet: A spatio-temporal graph and lstm framework for accident anticipation.IEEE Access, 13:213769–213779,

Vipooshan Vipulananthan, Kumudu Mohottala, Kavindu Chinthana, Nimsara Paramulla, and Charith Chitraranjan. Stagnet: A spatio-temporal graph and lstm framework for accident anticipation.IEEE Access, 13:213769–213779,

work page

[28] [28]

doi: 10.1109/ACCESS.2025.3645127

work page doi:10.1109/access.2025.3645127 2025

[29] [29]

Videomae v2: Scaling video masked autoencoders with dual mask- ing

Limin Wang, Bingkun Huang, Zhiyu Zhao, Zhan Tong, Yinan He, Yi Wang, Yali Wang, and Yu Qiao. Videomae v2: Scaling video masked autoencoders with dual mask- ing. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 14549–14560, 2023

work page 2023

[30] [30]

Quo vadis, action recognition? a new model and the kinetics dataset

Joao Carreira and Andrew Zisserman. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6299–6308, 2017

work page 2017

[31] [31]

Slowfast networks for video recognition

Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. Slowfast networks for video recognition. 2019 IEEE/CVF International Conference on Computer 11 Vision (ICCV), pages 6201–6210, 2018. URLhttps:// api.semanticscholar.org/CorpusID:54463801

work page 2019

[32] [32]

DINOv3

Oriane Siméoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[33] [33]

Anticipating accidents in dashcam videos

Fu-Hsiang Chan, Yu-Ting Chen, Yu Xiang, and Min Sun. Anticipating accidents in dashcam videos. InComputer Vision–ACCV 2016: 13th Asian Conference on Computer Vision, Taipei, Taiwan, November 20-24, 2016, Revised Selected Papers, Part IV 13, pages 136–153. Springer, 2017

work page 2016

[34] [34]

Uncertainty-based traf- fic accident anticipation with spatio-temporal relational learning

Wentao Bao, Qi Yu, and Yu Kong. Uncertainty-based traf- fic accident anticipation with spatio-temporal relational learning. InProceedings of the 28th ACM International Conference on Multimedia, pages 2682–2690, 2020

work page 2020

[35] [35]

A new approach to traffic accident anticipation with geomet- ric features for better generalizability.IEEE Access, 11: 29263–29274, 2023

Farhan Mahmood, Daehyeon Jeong, and Jeha Ryu. A new approach to traffic accident anticipation with geomet- ric features for better generalizability.IEEE Access, 11: 29263–29274, 2023

work page 2023

[36] [36]

Vision-based collision warning sys- tems with deep learning: A systematic review.Journal of Imaging, 11(2):64, 2025

Charith Chitraranjan, Vipooshan Vipulananthan, and Thu- varakan Sritharan. Vision-based collision warning sys- tems with deep learning: A systematic review.Journal of Imaging, 11(2):64, 2025

work page 2025

[37] [37]

Faster r-cnn: Towards real-time object detection with re- gion proposal networks.Advances in neural information processing systems, 28, 2015

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with re- gion proposal networks.Advances in neural information processing systems, 28, 2015

work page 2015

[38] [38]

Aat-da: Accident anticipa- tion transformer with driver attention

Yuto Kumamoto, Kento Ohtani, Daiki Suzuki, Minori Ya- mataka, and Kazuya Takeda. Aat-da: Accident anticipa- tion transformer with driver attention. InProceedings of the Winter Conference on Applications of Computer Vi- sion, pages 1142–1151, 2025

work page 2025

[39] [39]

Gated driver attention predictor

Tianci Zhao, Xue Bai, Jianwu Fang, and Jianru Xue. Gated driver attention predictor. In2023 IEEE 26th In- ternational Conference on Intelligent Transportation Sys- tems (ITSC), pages 270–276. IEEE, 2023

work page 2023

[40] [40]

Cognitive traf- fic accident anticipation.IEEE Intelligent Transportation Systems Magazine, 16(5):17–32, 2024

Lei-Lei Li, Jianwu Fang, and Jianru Xue. Cognitive traf- fic accident anticipation.IEEE Intelligent Transportation Systems Magazine, 16(5):17–32, 2024

work page 2024

[41] [41]

Cognitive accident pre- diction in driving scenes: A multimodality benchmark

Jianwu Fang, Lei-Lei Li, Kuan Yang, Zhedong Zheng, Jianru Xue, and Tat-Seng Chua. Cognitive accident pre- diction in driving scenes: A multimodality benchmark. arXiv preprint arXiv:2212.09381, 2022

work page arXiv 2022

[42] [42]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

work page 2017

[43] [43]

Masked label prediction: Uni- fied message passing model for semi-supervised classi- fication, 2021

Yunsheng Shi, Zhengjie Huang, Shikun Feng, Hui Zhong, Wenjin Wang, and Yu Sun. Masked label prediction: Uni- fied message passing model for semi-supervised classi- fication, 2021. URLhttps://arxiv.org/abs/2009. 03509

work page 2021

[44] [44]

The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets.PloS one, 10 (3):e0118432, 2015

Takaya Saito and Marc Rehmsmeier. The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets.PloS one, 10 (3):e0118432, 2015

work page 2015

[45] [45]

Bdd100k: A diverse driving dataset for heteroge- neous multitask learning

Fisher Yu, Haofeng Chen, Xin Wang, Wenqi Xian, Yingy- ing Chen, Fangchen Liu, Vashisht Madhavan, and Trevor Darrell. Bdd100k: A diverse driving dataset for heteroge- neous multitask learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2636–2645, 2020

work page 2020

[46] [46]

Grad-cam: Visual explanations from deep networks via gradient-based localization

Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Ba- tra. Grad-cam: Visual explanations from deep networks via gradient-based localization. InProceedings of the IEEE international conference on computer vision, pages 618–626, 2017

work page 2017

[47] [47]

Gsc: A graph and spatio-temporal continuity based framework for accident anticipation.IEEE Transactions on Intelligent V ehicles, 9 (1):2249–2261, 2023

Tianhang Wang, Kai Chen, Guang Chen, Bin Li, Zhijun Li, Zhengfa Liu, and Changjun Jiang. Gsc: A graph and spatio-temporal continuity based framework for accident anticipation.IEEE Transactions on Intelligent V ehicles, 9 (1):2249–2261, 2023

work page 2023

[48] [48]

Dynamic attention augmented graph net- work for video accident anticipation.Pattern Recognition, 147:110071, 2024

Wenfeng Song, Shuai Li, Tao Chang, Ke Xie, Aimin Hao, and Hong Qin. Dynamic attention augmented graph net- work for video accident anticipation.Pattern Recognition, 147:110071, 2024

work page 2024

[49] [49]

Mobilenetv2: In- verted residuals and linear bottlenecks

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: In- verted residuals and linear bottlenecks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4510–4520, 2018

work page 2018

[50] [50]

https://pypi.org/project/thop/, 2022

THOP: A tool to count the FLOPs of PyTorch model. https://pypi.org/project/thop/, 2022. [Online; accessed 14-November-2025]

work page 2022