pith. machine review for the scientific record. sign in

arxiv: 2605.07859 · v1 · submitted 2026-05-08 · 💻 cs.CV

Recognition: no theorem link

EyeCue: Driver Cognitive Distraction Detection via Gaze-Empowered Egocentric Video Understanding

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:24 UTC · model grok-4.3

classification 💻 cs.CV
keywords cognitive distraction detectioneye gazeegocentric videodriver attention modelingmultimodal fusionCogDrive datasetdriving video understandingroad safety
0
0 comments X

The pith

EyeCue detects driver cognitive distraction by modeling interactions between eye gaze and visual context in egocentric video.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Driver cognitive distraction occurs when thoughts unrelated to driving divert attention, even if the driver looks attentive and shows no obvious movements, making it a hidden but major cause of collisions. The paper presents EyeCue as a framework that fuses eye gaze data with egocentric video to track context-aware attention patterns over time. It also creates the CogDrive dataset by adding cognitive distraction annotations to four existing driving video collections to address limited data scale. On this dataset EyeCue reaches 74.38 percent accuracy and exceeds eleven baselines from six model families by more than seven points while maintaining over 70 percent accuracy across varied road types, times of day, and weather. These outcomes indicate that explicit modeling of gaze-context interactions improves detection of internal mental states in driving scenes.

Core claim

EyeCue is a gaze-empowered egocentric video understanding framework that detects driver cognitive distraction by integrating eye gaze with egocentric video to enable context-aware modeling of the driver's attention over time. On the introduced CogDrive dataset, formed by augmenting four existing driving datasets with cognitive distraction annotations, the framework achieves 74.38 percent accuracy, outperforming 11 baselines from six model families by over 7 percent, and sustains over 70 percent accuracy across diverse scenarios including different road types, times of day, and weather conditions.

What carries the argument

Gaze-context interaction modeling that captures how cognitive distraction appears in the changing relationship between where the driver looks and the surrounding visual scene.

If this is right

  • Cognitive distraction detection becomes feasible at scale using only gaze and video without needing explicit physical movement cues.
  • The same multimodal interaction approach generalizes to new driving conditions such as night driving or adverse weather.
  • Augmenting existing video datasets with targeted annotations overcomes data scarcity for training attention models.
  • Cross-modal fusion of gaze and scene context proves more effective than single-modality or non-interactive baselines for this task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Vehicle safety systems could embed similar gaze-video models to issue real-time alerts when cognitive distraction is inferred.
  • The interaction-modeling technique may transfer to other attention-critical settings such as pilot monitoring or industrial operators.
  • Future validation against physiological sensors could refine or replace purely annotation-based labels for mental-state detection.
  • The framework supplies a concrete way to study how visual attention drifts in dynamic real-world environments beyond driving.

Load-bearing premise

The cognitive distraction labels added to existing driving video datasets accurately reflect the driver's true internal mental state.

What would settle it

A controlled study that records simultaneous EEG or other brain-activity measures while drivers perform tasks with known cognitive loads and then checks whether the model's output classifications align with those physiological indicators.

Figures

Figures reproduced from arXiv: 2605.07859 by Abhijit Sarkar, Bo Ji, JinYi Yoon, Lang Zhang, Matthew Corbett.

Figure 1
Figure 1. Figure 1: Is the driver cognitively distracted? This figure shows a driver’s journey on the road, including distracted driving (top) and attentive driving (bottom). Video frames come from the DR(eye)VE dataset [Palazzi et al., 2018], which records the driver’s egocentric views and corresponding gaze points over time. We added a green dot to each raw frame to represent the driver’s gaze point at that time. Therefore,… view at source ↗
Figure 2
Figure 2. Figure 2: EyeCue architecture. (a) The video encoder extracts the global contextual information and fine-grained visual details. (b) The GDSQ module captures the relationship between visual context and eye gaze. (c) The gaze encoder obtains global attention patterns and frame-level gaze details. Finally, these three types of class tokens are concatenated and fed into a multilayer perceptron for classification. as a … view at source ↗
Figure 4
Figure 4. Figure 4: Example of gaze-integrated video preprocessing methods. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Accuracy (%) under different preprocessing methods. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Examples of three types of driver distraction. These fig [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Examples of selected gaze-based patch tokens when the [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Examples of fixation density maps used in our annotation [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Grad-CAM visualizations on representative failure cases. [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Confusion matrix for EyeCue on the CogDrive dataset. [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗
read the original abstract

Driver cognitive distraction is a major cause of road collisions and remains difficult to detect. Unlike manual or visual distraction, cognitive distraction is diverted by thoughts unrelated to driving, even when the driver appears visually attentive and exhibits no explicit physical movements. In this work, we propose EyeCue, a gaze-empowered egocentric video understanding framework, to detect driver cognitive distraction. A key insight is that cognitive distraction manifests in the interaction between eye gaze and visual context. To capture this interaction, EyeCue integrates eye gaze with egocentric video to enable context-aware modeling of the driver's attention over time. Furthermore, to tackle the limited scale and diversity of existing datasets, we introduce CogDrive, a comprehensive multi-scenario dataset that augments four existing driving datasets with cognitive distraction annotations. Through extensive evaluations on CogDrive, we show that EyeCue achieves the highest accuracy of 74.38%, outperforming 11 baselines from 6 model families by over 7%. Notably, EyeCue can achieve an accuracy of over 70% across various driving scenarios (different road types, times of day, and weather conditions) with strong generalizability. These results highlight the importance of modeling gaze-context interactions and the effectiveness of cross-modal interaction modeling for multimodal cognitive distraction detection. Our codes and CogDrive dataset resources are available at https://github.com/langzhang2000/EyeCue.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes EyeCue, a multimodal framework that integrates eye gaze with egocentric video to model context-aware attention for detecting driver cognitive distraction (internal mental states unrelated to driving). It introduces CogDrive, a dataset created by augmenting four existing driving datasets with cognitive-distraction annotations, and reports that EyeCue achieves 74.38% accuracy on CogDrive while outperforming 11 baselines from 6 model families by more than 7%, with consistent results (>70% accuracy) across road types, times of day, and weather conditions. Code and dataset resources are released.

Significance. If the CogDrive labels validly capture internal cognitive states, the work would be significant for advancing driver monitoring by demonstrating the value of explicit gaze-context interaction modeling and by releasing a new multi-scenario benchmark. The public code and dataset release is a clear strength for reproducibility. However, the significance is limited by the absence of physiological validation for the labels, which directly affects whether the reported gains address the stated task of internal distraction detection.

major comments (2)
  1. [CogDrive dataset construction] CogDrive dataset construction: the cognitive distraction annotations rely on subjective video review of existing driving footage without reported physiological ground truth (EEG, fNIRS, or validated secondary-task protocols), inter-rater reliability statistics, or cross-validation against brain-activity measures. This is load-bearing for the central claim, because the 74.38% accuracy and >7% gains over baselines (Abstract and experimental section) become difficult to interpret if the labels primarily encode observable gaze or scene patterns rather than verified internal mental states.
  2. [Experimental evaluation] Experimental evaluation: no details are provided on baseline re-implementations, hyperparameter search, or statistical significance testing (e.g., McNemar or paired t-tests) for the reported performance differences. Without these, it is impossible to determine whether the outperformance is robust or could be explained by implementation variance.
minor comments (2)
  1. [Abstract] The abstract states 'over 7%' improvement but does not name the strongest baseline or the exact metric value for that baseline, reducing immediate clarity.
  2. [Method] Notation for the gaze-context fusion module could be clarified with an explicit equation or diagram reference in the method section to aid readers unfamiliar with egocentric video models.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback. We address each major comment below with clarifications on our methodology and commitments to revisions where feasible.

read point-by-point responses
  1. Referee: [CogDrive dataset construction] CogDrive dataset construction: the cognitive distraction annotations rely on subjective video review of existing driving footage without reported physiological ground truth (EEG, fNIRS, or validated secondary-task protocols), inter-rater reliability statistics, or cross-validation against brain-activity measures. This is load-bearing for the central claim, because the 74.38% accuracy and >7% gains over baselines (Abstract and experimental section) become difficult to interpret if the labels primarily encode observable gaze or scene patterns rather than verified internal mental states.

    Authors: We acknowledge the referee's concern regarding label validity. The CogDrive annotations were generated via expert review of egocentric videos, identifying cognitive distraction based on gaze patterns inconsistent with driving demands, scene context, and absence of external distractions. This follows established practices in the field for video-based labeling when source datasets lack physiological signals. We agree that physiological ground truth would strengthen the claims and that its absence is a limitation. In revision, we will expand the dataset section with detailed annotation protocols, annotator information, and any available inter-rater statistics. We will also explicitly discuss this limitation and its implications for interpreting the results. revision: partial

  2. Referee: [Experimental evaluation] Experimental evaluation: no details are provided on baseline re-implementations, hyperparameter search, or statistical significance testing (e.g., McNemar or paired t-tests) for the reported performance differences. Without these, it is impossible to determine whether the outperformance is robust or could be explained by implementation variance.

    Authors: We appreciate this observation. The revised manuscript will include a new subsection detailing the re-implementation of all 11 baselines, specifying architectures, pre-training, and exact hyperparameter search procedures (ranges and selected values for learning rate, batch size, etc.). We will also add statistical significance testing using McNemar's test for the key performance comparisons to confirm the robustness of the reported gains. revision: yes

standing simulated objections not resolved
  • Physiological validation (EEG/fNIRS or equivalent) for CogDrive labels, as no such data exists in the source datasets and cannot be retroactively obtained

Circularity Check

0 steps flagged

No circularity: empirical evaluation on held-out data with no self-referential derivations

full rationale

The paper introduces the EyeCue framework for gaze-context modeling and the CogDrive dataset with cognitive distraction annotations, then reports standard supervised classification accuracy (74.38%) and comparisons against 11 baselines on held-out splits. No equations, first-principles derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The performance numbers are obtained via conventional train/test evaluation on the augmented dataset rather than any reduction of outputs to inputs by construction. The annotation process for labels is a data-preparation step, not a mathematical loop. This is a self-contained empirical ML contribution whose central claims do not collapse to tautology.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

With only the abstract available, the ledger is limited; the approach rests on the domain assumption that gaze-visual interactions reliably indicate cognitive states, with standard deep learning hyperparameters left unspecified.

free parameters (1)
  • model hyperparameters and training thresholds
    Standard in neural network training but not detailed in the abstract.
axioms (1)
  • domain assumption Cognitive distraction manifests detectably in the interaction between eye gaze and visual driving context over time
    This is the key insight stated in the abstract that enables the framework.

pith-pipeline@v0.9.0 · 5553 in / 1143 out tokens · 61822 ms · 2026-05-11T02:24:22.428556+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

105 extracted references · 105 canonical work pages · 5 internal anchors

  1. [1]

    Structure and Interpretation of Computer Programs

    Harold Abelson and Gerald Jay Sussman and Julie Sussman. Structure and Interpretation of Computer Programs. 1985

  2. [2]

    Visual Information Extraction with Lixto

    Robert Baumgartner and Georg Gottlob and Sergio Flesca. Visual Information Extraction with Lixto. Proceedings of the 27th International Conference on Very Large Databases. 2001

  3. [3]

    Brachman and James G

    Ronald J. Brachman and James G. Schmolze. An overview of the KL-ONE knowledge representation system. Cognitive Science. 1985

  4. [4]

    Complexity results for nonmonotonic logics

    Georg Gottlob. Complexity results for nonmonotonic logics. Journal of Logic and Computation. 1992

  5. [5]

    Hypertree Decompositions and Tractable Queries

    Georg Gottlob and Nicola Leone and Francesco Scarcello. Hypertree Decompositions and Tractable Queries. Journal of Computer and System Sciences. 2002

  6. [6]

    Levesque

    Hector J. Levesque. Foundations of a functional approach to knowledge representation. Artificial Intelligence. 1984

  7. [7]

    Levesque

    Hector J. Levesque. A logic of implicit and explicit belief. Proceedings of the Fourth National Conference on Artificial Intelligence. 1984

  8. [8]

    On the compilability and expressive power of propositional planning formalisms

    Bernhard Nebel. On the compilability and expressive power of propositional planning formalisms. Journal of Artificial Intelligence Research. 2000

  9. [9]

    2023 IEEE Intelligent Vehicles Symposium (IV) , pages=

    Driver gaze fixation and pattern analysis in safety critical events , author=. 2023 IEEE Intelligent Vehicles Symposium (IV) , pages=. 2023 , organization=

  10. [10]

    Human Factors in Transportation , volume=

    A comprehensive safety analysis for gaze fixation of drivers to outside scene , author=. Human Factors in Transportation , volume=. 2022 , publisher=

  11. [11]

    2024 IEEE Intelligent Vehicles Symposium (IV) , pages=

    Semantic understanding of traffic scenes with large vision language models , author=. 2024 IEEE Intelligent Vehicles Symposium (IV) , pages=. 2024 , organization=

  12. [12]

    2023 IEEE Intelligent Vehicles Symposium (IV) , pages=

    Explainable driver activity recognition using video transformer in highly automated vehicle , author=. 2023 IEEE Intelligent Vehicles Symposium (IV) , pages=. 2023 , organization=

  13. [13]

    Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

    Egolife: Towards egocentric life assistant , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

  14. [14]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    From Gaze to Movement: Predicting Visual Attention for Autonomous Driving Human-Machine Interaction based on Programmatic Imitation Learning , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  15. [15]

    ICML , volume=

    Is space-time attention all you need for video understanding? , author=. ICML , volume=

  16. [16]

    Advances in neural information processing systems , volume=

    Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training , author=. Advances in neural information processing systems , volume=

  17. [17]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Mv-adapter: Multimodal video transfer learning for video text retrieval , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  18. [18]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    DGL: Dynamic global-local prompt tuning for text-video retrieval , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  19. [19]

    European Conference on Computer Vision , pages=

    Internvideo2: Scaling foundation models for multimodal video understanding , author=. European Conference on Computer Vision , pages=. 2024 , organization=

  20. [20]

    VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

    VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding , author=. arXiv preprint arXiv:2501.13106 , year=

  21. [21]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Egovlpv2: Egocentric video-language pre-training with fusion in the backbone , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  22. [22]

    Egovideo: Exploring egocentric founda- tion model and downstream adaptation.arXiv preprint arXiv:2406.18070, 2024

    Egovideo: Exploring egocentric foundation model and downstream adaptation , author=. arXiv preprint arXiv:2406.18070 , year=

  23. [23]

    arXiv preprint arXiv:2208.04464 , year=

    In the eye of transformer: Global-local correlation for egocentric gaze estimation , author=. arXiv preprint arXiv:2208.04464 , year=

  24. [24]

    IEEE Transactions on Multimedia , volume=

    Driver yawning detection based on subtle facial action recognition , author=. IEEE Transactions on Multimedia , volume=. 2020 , publisher=

  25. [25]

    Human factors , volume=

    Measuring driver perception: Combining eye-tracking and automated road scene perception , author=. Human factors , volume=. 2022 , publisher=

  26. [26]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    Slowfast networks for video recognition , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

  27. [27]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    An image is worth 16x16 words: Transformers for image recognition at scale , author=. arXiv preprint arXiv:2010.11929 , year=

  28. [28]

    Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

    Video-llava: Learning united visual representation by alignment before projection , author=. arXiv preprint arXiv:2311.10122 , year=

  29. [29]

    Advances in Neural Information Processing Systems , volume=

    Voila-a: Aligning vision-language models with user's gaze attention , author=. Advances in Neural Information Processing Systems , volume=

  30. [30]

    Scientific reports , volume=

    Novel method for rapid assessment of cognitive impairment using high-performance eye-tracking technology , author=. Scientific reports , volume=. 2019 , publisher=

  31. [31]

    Proceedings of the Winter Conference on Applications of Computer Vision , pages=

    Human Gaze Improves Vision Transformers by Token Masking , author=. Proceedings of the Winter Conference on Applications of Computer Vision , pages=

  32. [32]

    IEEE transactions on pattern analysis and machine intelligence , volume=

    Predicting the Driver's Focus of Attention: the DR (eye) VE Project , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2018 , publisher=

  33. [33]

    arXiv preprint arXiv:2504.00221 , year=

    GazeLLM: Multimodal LLMs incorporating Human Visual Attention , author=. arXiv preprint arXiv:2504.00221 , year=

  34. [34]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Videollm-online: Online video large language model for streaming video , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  35. [35]

    Advances in neural information processing systems , volume=

    Flashattention: Fast and memory-efficient exact attention with io-awareness , author=. Advances in neural information processing systems , volume=

  36. [36]

    Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

    GazeVQA: A video question answering dataset for multiview eye-gaze task-oriented collaborations , author=. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

  37. [37]

    A Short Note about Kinetics-600

    A short note about kinetics-600 , author=. arXiv preprint arXiv:1808.01340 , year=

  38. [38]

    The Kinetics Human Action Video Dataset

    The kinetics human action video dataset , author=. arXiv preprint arXiv:1705.06950 , year=

  39. [39]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Taco: Benchmarking generalizable bimanual tool-action-object understanding , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  40. [40]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    Object-aware gaze target detection , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

  41. [41]

    IEEE Transactions on Automation Science and Engineering , volume=

    A temporal--spatial deep learning approach for driver distraction detection based on EEG signals , author=. IEEE Transactions on Automation Science and Engineering , volume=. 2021 , publisher=

  42. [42]

    IEEE Sensors Journal , volume=

    Smartphone inertial measurement unit data features for analyzing driver driving behavior , author=. IEEE Sensors Journal , volume=. 2023 , publisher=

  43. [43]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Aide: A vision-driven multi-view, multi-modal, multi-tasking dataset for assistive driving perception , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  44. [44]

    Accident Analysis & Prevention , volume=

    Detection of driver manual distraction via image-based hand and ear recognition , author=. Accident Analysis & Prevention , volume=. 2020 , publisher=

  45. [45]

    Scientific reports , volume=

    Mind-wandering tends to occur under low perceptual demands during driving , author=. Scientific reports , volume=. 2016 , publisher=

  46. [46]

    2024 IEEE Intelligent Vehicles Symposium (IV) , pages=

    ViT-DD: Multi-task vision transformer for semi-supervised driver distraction detection , author=. 2024 IEEE Intelligent Vehicles Symposium (IV) , pages=. 2024 , organization=

  47. [47]

    Heliyon , volume=

    Risk assessment of driver performance in the oil and gas transportation industry: Analyzing the relationship between driver vigilance, attention, reaction time, and safe driving practices , author=. Heliyon , volume=. 2024 , publisher=

  48. [48]

    Open Access Emergency Medicine , pages=

    Measuring situation awareness in emergency setting: a systematic review of tools and outcomes , author=. Open Access Emergency Medicine , pages=. 2014 , publisher=

  49. [49]

    IEEE Sensors Journal , volume=

    Driver distraction from the EEG perspective: A review , author=. IEEE Sensors Journal , volume=. 2023 , publisher=

  50. [50]

    Scientific Reports , volume=

    Cognitive load classification of mixed reality human computer interaction tasks based on multimodal sensor signals , author=. Scientific Reports , volume=. 2025 , publisher=

  51. [51]

    Proceedings of the 2023 CHI conference on human factors in computing systems , pages=

    Bubbleu: Exploring augmented reality game design with uncertain ai-based interaction , author=. Proceedings of the 2023 CHI conference on human factors in computing systems , pages=

  52. [52]

    2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct) , pages=

    Industrial augmented reality: lessons learned from a long-term on-site assessment of augmented reality maintenance worker support systems , author=. 2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct) , pages=. 2022 , organization=

  53. [53]

    International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=

    Efficient spatiotemporal learning of microscopic video for augmented reality-guided phacoemulsification cataract surgery , author=. International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=. 2023 , organization=

  54. [54]

    Project Aria: A New Tool for Egocentric Multi-Modal AI Research

    Project aria: A new tool for egocentric multi-modal ai research , author=. arXiv preprint arXiv:2308.13561 , year=

  55. [55]

    2020 IEEE Intelligent Vehicles Symposium (IV) , pages=

    Toward real-time estimation of driver situation awareness: An eye-tracking approach based on moving objects of interest , author=. 2020 IEEE Intelligent Vehicles Symposium (IV) , pages=. 2020 , organization=

  56. [56]

    Advances in neural information processing systems , volume=

    Attention is all you need , author=. Advances in neural information processing systems , volume=

  57. [57]

    Driver Distraction Detection Methods: A Literature Review and Framework , year=

    Kashevnik, Alexey and Shchedrin, Roman and Kaiser, Christian and Stocker, Alexander , journal=. Driver Distraction Detection Methods: A Literature Review and Framework , year=

  58. [58]

    Transportation research interdisciplinary perspectives , volume=

    Evaluating driver cognitive distraction by eye tracking: From simulator to driving , author=. Transportation research interdisciplinary perspectives , volume=. 2020 , publisher=

  59. [59]

    ACM Transactions on Applied Perception (TAP) , volume=

    Discerning ambient/focal attention with coefficient K , author=. ACM Transactions on Applied Perception (TAP) , volume=. 2016 , publisher=

  60. [60]

    2020 IEEE International Symposium on Multimedia (ISM) , pages=

    Measuring driver situation awareness using region-of-interest prediction and eye tracking , author=. 2020 IEEE International Symposium on Multimedia (ISM) , pages=. 2020 , organization=

  61. [61]

    Transportation Research Part F: Traffic Psychology and Behaviour , volume=

    Visual search while driving: skill and awareness during inspection of the scene , author=. Transportation Research Part F: Traffic Psychology and Behaviour , volume=. 2002 , publisher=

  62. [62]

    2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC) , pages=

    Multi-view region of interest prediction for autonomous driving using semi-supervised labeling , author=. 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC) , pages=. 2020 , organization=

  63. [63]

    Psicothema , volume=

    Attention to speed and guide traffic signs with eye movements , author=. Psicothema , volume=

  64. [64]

    arXiv preprint arXiv:2503.09143 , year=

    Exo2ego: Exocentric knowledge guided mllm for egocentric video understanding , author=. arXiv preprint arXiv:2503.09143 , year=

  65. [65]

    IEEE transactions on intelligent transportation systems , volume=

    Real-time detection of driver cognitive distraction using support vector machines , author=. IEEE transactions on intelligent transportation systems , volume=. 2007 , publisher=

  66. [66]

    2016 international conference on advances in computing, communications and informatics (icacci) , pages=

    Driver's distraction detection based on gaze estimation , author=. 2016 international conference on advances in computing, communications and informatics (icacci) , pages=. 2016 , organization=

  67. [67]

    PloS one , volume=

    The distracted mind on the wheel: Overall propensity to mind wandering is associated with road crash responsibility , author=. PloS one , volume=. 2017 , publisher=

  68. [68]

    Transportation research record , volume=

    Systematic review of research on driver distraction in the context of advanced driver assistance systems , author=. Transportation research record , volume=. 2021 , publisher=

  69. [69]

    Nature Reviews Neuroscience , volume=

    Visual objects in context , author=. Nature Reviews Neuroscience , volume=. 2004 , publisher=

  70. [70]

    Visual cognition , volume=

    Eye movements and hazard perception in active and passive driving , author=. Visual cognition , volume=. 2015 , publisher=

  71. [71]

    2025 , url =

    Research Note: Distracted Driving in 2023 , author =. 2025 , url =

  72. [72]

    2023 , url =

    Budget Estimates: Fiscal Year 2024 , author =. 2023 , url =

  73. [73]

    Accident Analysis & Prevention , volume=

    Investigating the impact of driving automation systems on distracted driving behaviors , author=. Accident Analysis & Prevention , volume=. 2021 , publisher=

  74. [74]

    Distracted driving , volume=

    Driver distraction: A review of the literature , author=. Distracted driving , volume=

  75. [75]

    IEEE Transactions on Intelligent Vehicles , year=

    Bevgpt: Generative pre-trained foundation model for autonomous driving prediction, decision-making, and planning , author=. IEEE Transactions on Intelligent Vehicles , year=

  76. [76]

    Proceedings of the 15th biannual conference of the Italian SIGCHI chapter , pages=

    VisionARy: exploratory research on contextual language learning using AR glasses with ChatGPT , author=. Proceedings of the 15th biannual conference of the Italian SIGCHI chapter , pages=

  77. [77]

    2021 , note =

    Johnathan Silva , title =. 2021 , note =

  78. [78]

    IEEE Transactions on Reliability , volume=

    L-tla: A lightweight driver distraction detection method based on three-level attention mechanisms , author=. IEEE Transactions on Reliability , volume=. 2024 , publisher=

  79. [79]

    2020 international conference on information and communication technology convergence (ICTC) , pages=

    A hybrid deep learning approach for driver distraction detection , author=. 2020 international conference on information and communication technology convergence (ICTC) , pages=. 2020 , organization=

  80. [80]

    arXiv preprint arXiv:2506.23088 , year=

    Where, What, Why: Towards Explainable Driver Attention Prediction , author=. arXiv preprint arXiv:2506.23088 , year=

Showing first 80 references.