pith. sign in

arxiv: 2604.09305 · v3 · submitted 2026-04-10 · 💻 cs.CV

VAGNet: Vision-based Accident Anticipation with Global Features

Pith reviewed 2026-05-10 17:43 UTC · model grok-4.3

classification 💻 cs.CV
keywords accident anticipationdashcam videoglobal featuresVideoMAE-V2transformergraph modulestraffic safetyvision prediction
0
0 comments X

The pith

Global features from dashcam video let VAGNet anticipate traffic accidents more accurately and with less computation than object-tracking methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes VAGNet, a network that predicts upcoming accidents in dashcam footage by processing global scene features instead of detecting and tracking individual objects. It combines the VideoMAE-V2 foundation model for feature extraction with transformer and graph modules to model scene dynamics. Experiments across the DAD, DoTA, DADA, and Nexar datasets show higher average precision and longer mean time-to-accident while running more efficiently than prior approaches. Readers would care because this design could support faster, real-time alerts in driver assistance systems without heavy onboard processing.

Core claim

VAGNet is a deep neural network that anticipates accidents from dash-cam video by using global features of traffic scenes extracted with VideoMAE-V2, processed through transformer and graph modules, without any explicit object-level features or tracking. This yields higher average precision and mean time-to-accident on the DAD, DoTA, DADA, and Nexar benchmarks while remaining computationally lighter than existing methods that rely on per-object processing.

What carries the argument

VAGNet architecture of transformer and graph modules that process global features extracted by VideoMAE-V2 from entire traffic scenes to predict accidents.

If this is right

  • Real-time accident anticipation becomes practical for advanced driver assistance systems.
  • Higher average precision and longer mean time-to-accident provide earlier intervention opportunities.
  • Lower computational requirements allow deployment without dedicated object detection hardware.
  • The approach generalizes across the four tested benchmark datasets of varying complexity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Scene-level context may prove sufficient for other safety-related video tasks, simplifying pipelines that currently depend on object detection.
  • Combining this global-feature strategy with additional foundation models could improve performance in challenging conditions like night or rain.
  • Reduced compute needs open the possibility of running such anticipation on lower-power vehicle hardware.

Load-bearing premise

Global features from VideoMAE-V2 contain enough information to anticipate accidents accurately without needing explicit details from individual objects or their interactions.

What would settle it

Evaluating VAGNet on an additional real-world driving dataset where it shows lower average precision, shorter mean time-to-accident, or higher computational cost than object-based baselines.

Figures

Figures reproduced from arXiv: 2604.09305 by Charith D. Chitraranjan, Vipooshan Vipulananthan.

Figure 1
Figure 1. Figure 1: Architecture of the proposed VAGNet accident anticipation framework. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of crash-object categories in the DoTA [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Samples showing the variation of predicted accident probability across time/ [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Samples of visual explanations for accidents across different weather/lighting conditions. For each condition, the top row shows the original video frames in temporal order, and the bottom row shows the corresponding Grad-CAM [45] maps highlighting areas in the image that contribute to accident anticipation. Higher intensity indicates regions that have a stronger influence on the model’s anticipation of a … view at source ↗
read the original abstract

Traffic accidents are a leading cause of fatalities and injuries across the globe. Therefore, the ability to anticipate hazardous situations in advance is essential. Automated accident anticipation enables timely intervention through driver alerts and collision avoidance maneuvers, forming a key component of advanced driver assistance systems. In autonomous driving, such predictive capabilities support proactive safety behaviors, such as initiating defensive driving and human takeover when required. Using dashcam video as input offers a cost-effective solution, but it is challenging due to the complexity of real-world driving scenes. Accident anticipation systems need to operate in real-time. However, current methods involve extracting features from each detected object, which is computationally intensive. We propose VAGNet, a deep neural network that learns to predict accidents from dash-cam video using global features of traffic scenes without requiring explicit object-level features. The network consists of transformer and graph modules, and we use the vision foundation model VideoMAE-V2 for global feature extraction. Experiments on four benchmark datasets (DAD, DoTA, DADA, and Nexar) show that our method anticipates accidents with higher average precision and mean time-to-accident while being computationally more efficient compared to existing methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes VAGNet, a deep neural network for anticipating traffic accidents from dashcam video. It extracts global scene features using the pretrained VideoMAE-V2 vision foundation model, processes them via transformer and graph modules, and predicts accidents without any explicit object detection, tracking, or local feature extraction. Experiments on the DAD, DoTA, DADA, and Nexar benchmarks are reported to yield higher average precision and mean time-to-accident than prior methods while also improving computational efficiency.

Significance. If the central empirical claims hold under rigorous validation, the work would be significant for real-time autonomous driving and ADAS applications. By showing that global-only representations can outperform object-centric pipelines, it could reduce the computational cost of accident anticipation and simplify deployment on edge devices. The approach also demonstrates effective transfer of large-scale pretrained video models to safety-critical prediction tasks.

major comments (3)
  1. [Method (§3)] The central claim that global VideoMAE-V2 features suffice without object-level cues is load-bearing, yet the architecture description does not include an ablation that isolates the contribution of the global-only design (e.g., a variant with added object detections or local patch features). Without this, it is impossible to determine whether reported gains stem from the global-feature hypothesis or from other modeling choices.
  2. [Experiments (§4)] The performance claims (higher AP and mTTA on four datasets) are presented without reported details on experimental protocol: number of random seeds, statistical significance tests, variance across runs, or exact baseline re-implementations. This gap directly affects verifiability of the efficiency and accuracy improvements asserted in the abstract.
  3. [Results and Discussion (§5)] No qualitative analysis or failure-case examination is provided to test whether global patch embeddings preserve the fine-grained relative motions (e.g., sudden cut-ins or pedestrian incursions) that drive many accidents in DAD/DoTA. Such analysis is required to address the risk that performance reflects dataset correlations rather than true anticipation capability.
minor comments (2)
  1. [Abstract] The abstract states performance improvements but omits numerical deltas or efficiency metrics (e.g., FPS or FLOPs); adding these would improve clarity.
  2. [Method (§3.2)] Notation for the graph module and transformer integration could be made more explicit (e.g., defining the adjacency matrix construction) to aid reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. We address each major comment below and commit to revisions that strengthen the manuscript's rigor and verifiability.

read point-by-point responses
  1. Referee: [Method (§3)] The central claim that global VideoMAE-V2 features suffice without object-level cues is load-bearing, yet the architecture description does not include an ablation that isolates the contribution of the global-only design (e.g., a variant with added object detections or local patch features). Without this, it is impossible to determine whether reported gains stem from the global-feature hypothesis or from other modeling choices.

    Authors: We agree that an explicit ablation isolating the global-only design is necessary to substantiate the central hypothesis. In the revised manuscript we will add a controlled ablation study that compares the full VAGNet model against variants augmented with object detections (using an off-the-shelf detector) and with local patch features, while keeping all other components fixed. This will clarify whether the reported gains derive primarily from the global VideoMAE-V2 representation. revision: yes

  2. Referee: [Experiments (§4)] The performance claims (higher AP and mTTA on four datasets) are presented without reported details on experimental protocol: number of random seeds, statistical significance tests, variance across runs, or exact baseline re-implementations. This gap directly affects verifiability of the efficiency and accuracy improvements asserted in the abstract.

    Authors: We acknowledge that the current experimental section lacks sufficient protocol details for full reproducibility. In the revision we will report: (i) the number of random seeds used for training and evaluation, (ii) standard deviations across runs, (iii) results of statistical significance tests (e.g., paired t-tests against baselines), and (iv) precise descriptions of how each baseline was re-implemented, including any hyper-parameter choices and hardware settings. revision: yes

  3. Referee: [Results and Discussion (§5)] No qualitative analysis or failure-case examination is provided to test whether global patch embeddings preserve the fine-grained relative motions (e.g., sudden cut-ins or pedestrian incursions) that drive many accidents in DAD/DoTA. Such analysis is required to address the risk that performance reflects dataset correlations rather than true anticipation capability.

    Authors: We concur that qualitative evidence is required to demonstrate that global embeddings capture the critical fine-grained motions. We will add a new subsection containing: (a) attention-map visualizations on representative DAD and DoTA sequences highlighting sudden cut-ins and pedestrian incursions, and (b) a failure-case analysis that categorizes errors and discusses whether they stem from limitations of global features versus other factors. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical architecture proposal and benchmark evaluation

full rationale

The paper proposes VAGNet as a DNN architecture that extracts global features via the external pretrained VideoMAE-V2 model, then processes them with transformer and graph modules to anticipate accidents from dashcam video. Performance is asserted solely via direct experiments on four independent external benchmark datasets (DAD, DoTA, DADA, Nexar), reporting higher AP, mTTA, and efficiency versus prior methods. No equations, derivations, fitted-parameter predictions, or self-citation chains appear in the abstract or description; the central claim does not reduce to its inputs by construction and remains falsifiable against the cited benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim depends on the pre-trained VideoMAE-V2 model providing adequate global features and on the new network architecture being effective for this task; no invented entities are introduced.

free parameters (1)
  • Neural network hyperparameters and weights
    Standard in deep learning; model parameters are optimized on the training splits of the benchmark datasets.
axioms (1)
  • domain assumption Global features from VideoMAE-V2 capture sufficient information for accident anticipation without object-level details
    The method explicitly avoids explicit object detection and relies on this premise for both accuracy and efficiency.

pith-pipeline@v0.9.0 · 5506 in / 1507 out tokens · 104484 ms · 2026-05-10T17:43:49.583003+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 2 internal anchors

  1. [1]

    int/news-room/fact-sheets/detail/ road-traffic-injuries, 2023

    Road traffic injuries.https://www.who. int/news-room/fact-sheets/detail/ road-traffic-injuries, 2023. [Online; accessed 24-December-2025]

  2. [2]

    Unobserved heterogeneity and the statistical analysis of highway accident data.Analytic methods in accident re- search, 11:1–16, 2016

    Fred L Mannering, Venky Shankar, and Chandra R Bhat. Unobserved heterogeneity and the statistical analysis of highway accident data.Analytic methods in accident re- search, 11:1–16, 2016

  3. [3]

    To- ward explainable artificial intelligence for early anticipa- tion of traffic accidents.Transportation research record, 2676(6):743–755, 2022

    Muhammad Monjurul Karim, Yu Li, and Ruwen Qin. To- ward explainable artificial intelligence for early anticipa- tion of traffic accidents.Transportation research record, 2676(6):743–755, 2022

  4. [4]

    Intel- ligent defensive driving for autonomous vehicles: Frame- work, strategy and verification.Accident Analysis&Pre- vention, 226:108355, 2026

    Ting Zhang, Zixuan Wang, Hong Wang, and Jun Li. Intel- ligent defensive driving for autonomous vehicles: Frame- work, strategy and verification.Accident Analysis&Pre- vention, 226:108355, 2026. 10

  5. [5]

    Driver models for the definition of safety requirements of automated vehicles in interna- tional regulations

    Konstantinos Mattas, Giovanni Albano, Riccardo Donà, Maria Christina Galassi, Ricardo Suarez-Bertoa, Sandor Vass, and Biagio Ciuffo. Driver models for the definition of safety requirements of automated vehicles in interna- tional regulations. application to motorway driving condi- tions.Accident Analysis&Prevention, 174:106743, 2022

  6. [6]

    Smpc-based motion planning of automated vehicle when interacting with occluded pedestrians.IEEE Transactions on Intelligent Transportation Systems, 2024

    Daofei Li, Yangye Jiang, Jiajie Zhang, and Bin Xiao. Smpc-based motion planning of automated vehicle when interacting with occluded pedestrians.IEEE Transactions on Intelligent Transportation Systems, 2024

  7. [7]

    Seeing before observable: Poten- tial risk reasoning in autonomous driving via vision lan- guage models.arXiv preprint arXiv:2511.22928, 2025

    Jiaxin Liu, Xiangyu Yan, Liang Peng, Lei Yang, Lingjun Zhang, Yuechen Luo, Yueming Tao, Ashton Yu Xuan Tan, Mu Li, Lei Zhang, et al. Seeing before observable: Poten- tial risk reasoning in autonomous driving via vision lan- guage models.arXiv preprint arXiv:2511.22928, 2025

  8. [8]

    Ad- vancing explainable autonomous vehicle systems: A com- prehensive review and research roadmap.ACM Transac- tions on Human-Robot Interaction, 14(3):1–46, 2025

    Sule Tekkesinoglu, Azra Habibovic, and Lars Kunze. Ad- vancing explainable autonomous vehicle systems: A com- prehensive review and research roadmap.ACM Transac- tions on Human-Robot Interaction, 14(3):1–46, 2025

  9. [9]

    Recent advance- ments in end-to-end autonomous driving using deep learn- ing: A survey.IEEE Transactions on Intelligent V ehicles, 9(1):103–118, 2023

    Pranav Singh Chib and Pravendra Singh. Recent advance- ments in end-to-end autonomous driving using deep learn- ing: A survey.IEEE Transactions on Intelligent V ehicles, 9(1):103–118, 2023

  10. [10]

    Curse of rarity for au- tonomous vehicles.nature communications, 15(1):4808, 2024

    Henry X Liu and Shuo Feng. Curse of rarity for au- tonomous vehicles.nature communications, 15(1):4808, 2024

  11. [11]

    Deep long-tailed learning: A survey

    Yifan Zhang, Bingyi Kang, Bryan Hooi, Shuicheng Yan, and Jiashi Feng. Deep long-tailed learning: A survey. IEEE transactions on pattern analysis and machine in- telligence, 45(9):10795–10816, 2023

  12. [12]

    Nexar dash- cam collision prediction dataset and challenge

    Daniel Moura, Shizhan Zhu, and Orly Zvitia. Nexar dash- cam collision prediction dataset and challenge. InPro- ceedings of the Computer Vision and Pattern Recognition Conference, pages 2583–2591, 2025

  13. [13]

    Dota: unsupervised de- tection of traffic anomaly in driving videos.IEEE transac- tions on pattern analysis and machine intelligence, 2022

    Yu Yao, Xizi Wang, Mingze Xu, Zelin Pu, Yuchen Wang, Ella Atkins, and David Crandall. Dota: unsupervised de- tection of traffic anomaly in driving videos.IEEE transac- tions on pattern analysis and machine intelligence, 2022

  14. [14]

    Dada-2000: Can driving accident be predicted by driver attentionƒ analyzed by a bench- mark

    Jianwu Fang, Dingxin Yan, Jiahuan Qiao, Jianru Xue, He Wang, and Sen Li. Dada-2000: Can driving accident be predicted by driver attentionƒ analyzed by a bench- mark. In2019 IEEE Intelligent Transportation Systems Conference (ITSC), pages 4303–4309. IEEE, 2019

  15. [15]

    Scene-graph augmented data-driven risk assessment of autonomous vehicle decisions.IEEE Transactions on Intelligent Transportation Systems, 23 (7):7941–7951, 2021

    Shih-Yuan Yu, Arnav Vaibhav Malawade, Deepan Muthi- rayan, Pramod P Khargonekar, and Mohammad Abdul- lah Al Faruque. Scene-graph augmented data-driven risk assessment of autonomous vehicle decisions.IEEE Transactions on Intelligent Transportation Systems, 23 (7):7941–7951, 2021

  16. [16]

    [Online; accessed 27-October-2025]

    Retrofit Collision Warning System Gives Older Vehi- cles A Safety Boost.https://trid.trb.org/View/ 1574810, 2018. [Online; accessed 27-October-2025]

  17. [17]

    Mobileye: The future of driverless cars

    David B Yoffie. Mobileye: The future of driverless cars. Harvard Business School Case, pages 715–421, 2014

  18. [18]

    Vision-based traffic accident detection and anticipation: A survey.IEEE Transactions on Circuits and Systems for Video Technology, 2023

    Jianwu Fang, Jiahuan Qiao, Jianru Xue, and Zhengguo Li. Vision-based traffic accident detection and anticipation: A survey.IEEE Transactions on Circuits and Systems for Video Technology, 2023

  19. [19]

    Ccaf-net: Cascade complementarity- aware fusion network for traffic accident prediction in dashcam videos.Neurocomputing, 624:129285, 2025

    Wei Liu, Yafei Li, Tao Zhang, Yixiang Gao, Longsheng Wei, and Jun Chen. Ccaf-net: Cascade complementarity- aware fusion network for traffic accident prediction in dashcam videos.Neurocomputing, 624:129285, 2025

  20. [20]

    Early traffic accident anticipation via fea- ture consistency representation and soft label regression

    Yuanhong Zhong, Ge Yan, Ruyue Zhu, Ping Gan, and Xuerui Shen. Early traffic accident anticipation via fea- ture consistency representation and soft label regression. ACM Transactions on Multimedia Computing, Communi- cations and Applications, 2025

  21. [21]

    Graph (graph): A nested graph-based framework for early accident anticipation

    Nupur Thakur, PrasanthSai Gouripeddi, and Baoxin Li. Graph (graph): A nested graph-based framework for early accident anticipation. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 7533–7541, 2024

  22. [22]

    A dynamic spatial-temporal attention network for early anticipation of traffic accidents.IEEE Transactions on Intelligent Transportation Systems, 23 (7):9590–9600, 2022

    Muhammad Monjurul Karim, Yu Li, Ruwen Qin, and Zhaozheng Yin. A dynamic spatial-temporal attention network for early anticipation of traffic accidents.IEEE Transactions on Intelligent Transportation Systems, 23 (7):9590–9600, 2022

  23. [23]

    Spatiotemporal scene- graph embedding for autonomous vehicle collision pre- diction.IEEE Internet of Things Journal, 9(12):9379– 9388, 2022

    Arnav Vaibhav Malawade, Shih-Yuan Yu, Brandon Hsu, Deepan Muthirayan, Pramod P Khargonekar, and Mo- hammad Abdullah Al Faruque. Spatiotemporal scene- graph embedding for autonomous vehicle collision pre- diction.IEEE Internet of Things Journal, 9(12):9379– 9388, 2022

  24. [24]

    Drive: Deep rein- forced accident anticipation with visual explanation

    Wentao Bao, Qi Yu, and Yu Kong. Drive: Deep rein- forced accident anticipation with visual explanation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7619–7628, 2021

  25. [25]

    Anticipating traffic accidents with adaptive loss and large-scale incident db

    Tomoyuki Suzuki, Hirokatsu Kataoka, Yoshimitsu Aoki, and Yutaka Satoh. Anticipating traffic accidents with adaptive loss and large-scale incident db. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3521–3529, 2018

  26. [26]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014

  27. [27]

    Stagnet: A spatio-temporal graph and lstm framework for accident anticipation.IEEE Access, 13:213769–213779,

    Vipooshan Vipulananthan, Kumudu Mohottala, Kavindu Chinthana, Nimsara Paramulla, and Charith Chitraranjan. Stagnet: A spatio-temporal graph and lstm framework for accident anticipation.IEEE Access, 13:213769–213779,

  28. [28]

    doi: 10.1109/ACCESS.2025.3645127

  29. [29]

    Videomae v2: Scaling video masked autoencoders with dual mask- ing

    Limin Wang, Bingkun Huang, Zhiyu Zhao, Zhan Tong, Yinan He, Yi Wang, Yali Wang, and Yu Qiao. Videomae v2: Scaling video masked autoencoders with dual mask- ing. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 14549–14560, 2023

  30. [30]

    Quo vadis, action recognition? a new model and the kinetics dataset

    Joao Carreira and Andrew Zisserman. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6299–6308, 2017

  31. [31]

    Slowfast networks for video recognition

    Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. Slowfast networks for video recognition. 2019 IEEE/CVF International Conference on Computer 11 Vision (ICCV), pages 6201–6210, 2018. URLhttps:// api.semanticscholar.org/CorpusID:54463801

  32. [32]

    DINOv3

    Oriane Siméoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104, 2025

  33. [33]

    Anticipating accidents in dashcam videos

    Fu-Hsiang Chan, Yu-Ting Chen, Yu Xiang, and Min Sun. Anticipating accidents in dashcam videos. InComputer Vision–ACCV 2016: 13th Asian Conference on Computer Vision, Taipei, Taiwan, November 20-24, 2016, Revised Selected Papers, Part IV 13, pages 136–153. Springer, 2017

  34. [34]

    Uncertainty-based traf- fic accident anticipation with spatio-temporal relational learning

    Wentao Bao, Qi Yu, and Yu Kong. Uncertainty-based traf- fic accident anticipation with spatio-temporal relational learning. InProceedings of the 28th ACM International Conference on Multimedia, pages 2682–2690, 2020

  35. [35]

    A new approach to traffic accident anticipation with geomet- ric features for better generalizability.IEEE Access, 11: 29263–29274, 2023

    Farhan Mahmood, Daehyeon Jeong, and Jeha Ryu. A new approach to traffic accident anticipation with geomet- ric features for better generalizability.IEEE Access, 11: 29263–29274, 2023

  36. [36]

    Vision-based collision warning sys- tems with deep learning: A systematic review.Journal of Imaging, 11(2):64, 2025

    Charith Chitraranjan, Vipooshan Vipulananthan, and Thu- varakan Sritharan. Vision-based collision warning sys- tems with deep learning: A systematic review.Journal of Imaging, 11(2):64, 2025

  37. [37]

    Faster r-cnn: Towards real-time object detection with re- gion proposal networks.Advances in neural information processing systems, 28, 2015

    Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with re- gion proposal networks.Advances in neural information processing systems, 28, 2015

  38. [38]

    Aat-da: Accident anticipa- tion transformer with driver attention

    Yuto Kumamoto, Kento Ohtani, Daiki Suzuki, Minori Ya- mataka, and Kazuya Takeda. Aat-da: Accident anticipa- tion transformer with driver attention. InProceedings of the Winter Conference on Applications of Computer Vi- sion, pages 1142–1151, 2025

  39. [39]

    Gated driver attention predictor

    Tianci Zhao, Xue Bai, Jianwu Fang, and Jianru Xue. Gated driver attention predictor. In2023 IEEE 26th In- ternational Conference on Intelligent Transportation Sys- tems (ITSC), pages 270–276. IEEE, 2023

  40. [40]

    Cognitive traf- fic accident anticipation.IEEE Intelligent Transportation Systems Magazine, 16(5):17–32, 2024

    Lei-Lei Li, Jianwu Fang, and Jianru Xue. Cognitive traf- fic accident anticipation.IEEE Intelligent Transportation Systems Magazine, 16(5):17–32, 2024

  41. [41]

    Cognitive accident pre- diction in driving scenes: A multimodality benchmark

    Jianwu Fang, Lei-Lei Li, Kuan Yang, Zhedong Zheng, Jianru Xue, and Tat-Seng Chua. Cognitive accident pre- diction in driving scenes: A multimodality benchmark. arXiv preprint arXiv:2212.09381, 2022

  42. [42]

    Attention is all you need.Advances in neural information processing systems, 30, 2017

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

  43. [43]

    Masked label prediction: Uni- fied message passing model for semi-supervised classi- fication, 2021

    Yunsheng Shi, Zhengjie Huang, Shikun Feng, Hui Zhong, Wenjin Wang, and Yu Sun. Masked label prediction: Uni- fied message passing model for semi-supervised classi- fication, 2021. URLhttps://arxiv.org/abs/2009. 03509

  44. [44]

    The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets.PloS one, 10 (3):e0118432, 2015

    Takaya Saito and Marc Rehmsmeier. The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets.PloS one, 10 (3):e0118432, 2015

  45. [45]

    Bdd100k: A diverse driving dataset for heteroge- neous multitask learning

    Fisher Yu, Haofeng Chen, Xin Wang, Wenqi Xian, Yingy- ing Chen, Fangchen Liu, Vashisht Madhavan, and Trevor Darrell. Bdd100k: A diverse driving dataset for heteroge- neous multitask learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2636–2645, 2020

  46. [46]

    Grad-cam: Visual explanations from deep networks via gradient-based localization

    Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Ba- tra. Grad-cam: Visual explanations from deep networks via gradient-based localization. InProceedings of the IEEE international conference on computer vision, pages 618–626, 2017

  47. [47]

    Gsc: A graph and spatio-temporal continuity based framework for accident anticipation.IEEE Transactions on Intelligent V ehicles, 9 (1):2249–2261, 2023

    Tianhang Wang, Kai Chen, Guang Chen, Bin Li, Zhijun Li, Zhengfa Liu, and Changjun Jiang. Gsc: A graph and spatio-temporal continuity based framework for accident anticipation.IEEE Transactions on Intelligent V ehicles, 9 (1):2249–2261, 2023

  48. [48]

    Dynamic attention augmented graph net- work for video accident anticipation.Pattern Recognition, 147:110071, 2024

    Wenfeng Song, Shuai Li, Tao Chang, Ke Xie, Aimin Hao, and Hong Qin. Dynamic attention augmented graph net- work for video accident anticipation.Pattern Recognition, 147:110071, 2024

  49. [49]

    Mobilenetv2: In- verted residuals and linear bottlenecks

    Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: In- verted residuals and linear bottlenecks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4510–4520, 2018

  50. [50]

    https://pypi.org/project/thop/, 2022

    THOP: A tool to count the FLOPs of PyTorch model. https://pypi.org/project/thop/, 2022. [Online; accessed 14-November-2025]