arxiv: 2604.08435 · v1 · submitted 2026-04-09 · 💻 cs.CV · cs.AI

Recognition: unknown

HST-HGN: Heterogeneous Spatial-Temporal Hypergraph Networks with Bidirectional State Space Models for Global Fatigue Assessment

Changdao Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:15 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords driver fatigue assessmenthypergraph neural networksbidirectional mambauntrimmed video analysisfacial expression recognitionstate space modelsreal-time edge deploymentmulti-modal fusion

0 comments

The pith

HST-HGN fuses hierarchical hypergraphs with bidirectional state space models to assess driver fatigue from untrimmed videos efficiently.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes HST-HGN to detect driver fatigue in long unedited videos while staying within tight computing limits. It builds a hierarchical hypergraph that combines pose-based geometric structures with texture patches to capture complex multi-part facial interactions that simpler graphs miss. A Bi-Mamba module then processes the full time sequence in both directions at linear cost, letting the system track complete cycles of subtle actions instead of isolated frames. This setup targets better separation of similar-looking behaviors like yawning and speaking across their natural durations.

Core claim

HST-HGN introduces a heterogeneous spatial-temporal hypergraph network that dynamically fuses pose-disentangled geometric topologies with multi-modal texture patches to model high-order facial synergies, paired with a Bi-Mamba module for bidirectional linear-complexity temporal filtering. This enables distinguishing ambiguous transient actions across their complete physiological lifecycles in untrimmed videos, achieving state-of-the-art performance with computational efficiency suitable for real-time in-cabin edge deployment.

What carries the argument

Hierarchical hypergraph fusion of pose-disentangled geometries and multi-modal texture patches together with Bi-Mamba bidirectional state space modeling, which jointly handles high-order spatial deformations and global temporal evolution.

If this is right

Distinguishes yawning from speaking by encompassing complete action lifecycles rather than isolated frames.
Achieves state-of-the-art results across diverse fatigue benchmarks while maintaining linear temporal complexity.
Enables real-time in-cabin edge deployment by balancing discriminative power and computational efficiency.
Overcomes the modeling limits of both heavy architectures and traditional pairwise graph networks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same spatial-temporal fusion could extend to other long-duration subtle action tasks such as micro-expression or posture analysis.
Linear complexity opens the possibility of scaling to hour-long untrimmed recordings without quadratic cost growth.
Pairing the model with vehicle telemetry signals might further reduce false positives in real driving conditions.

Load-bearing premise

The fusion of pose-disentangled geometric topologies with multi-modal texture patches in a hierarchical hypergraph, combined with bidirectional Mamba filtering, can reliably distinguish ambiguous transient actions like yawning versus speaking across their complete physiological lifecycles.

What would settle it

A controlled test on untrimmed videos containing extended yawning and speaking sequences where accuracy gains over standard graph or transformer baselines disappear or reverse while compute cost remains higher.

Figures

Figures reproduced from arXiv: 2604.08435 by Changdao Chen.

**Figure 2.** Figure 2: We first detail the Global Sparse Sampling strategy, which [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 2.** Figure 2: An overview of HST-HGN framework. The architecture consists of three core stages. Stage 1 extracts pose-invariant [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative t-SNE visualization of the learned fea [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 5.** Figure 5: Visualization of the temporal contribution den [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 4.** Figure 4: Visualization of spatial reasoning during a yawning [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

It remains challenging to assess driver fatigue from untrimmed videos under constrained computational budgets, due to the difficulty of modeling long-range temporal dependencies in subtle facial expressions. Some existing approaches rely on computationally heavy architectures, whereas others employ traditional lightweight pairwise graph networks, despite their limited capacity to model high-order synergies and global temporal context. Therefore, we propose HST-HGN, a novel Heterogeneous Spatial-Temporal Hypergraph Network driven by Bidirectional State Space Models. Spatially, we introduce a hierarchical hypergraph network to fuse pose-disentangled geometric topologies with multi-modal texture patches dynamically. This formulation encapsulates high-order synergistic facial deformations, effectively overcoming the limitations of conventional methods. In temporal terms, a Bi-Mamba module with linear complexity is applied to perform bidirectional sequence modeling. This explicit temporal-evolution filtering enables the network to distinguish highly ambiguous transient actions, such as yawning versus speaking, while encompassing their complete physiological lifecycles. Extensive evaluations across diverse fatigue benchmarks demonstrate that HST-HGN achieves state-of-the-art performance. In particular, our method strikes a balance between discriminative power and computational efficiency, making it well-suited for real-time in-cabin edge deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HST-HGN combines a hierarchical hypergraph for facial geometry and texture with bidirectional Mamba for long video sequences, but the SOTA and real-time edge claims depend on unshown ablation tables and hardware FPS numbers.

read the letter

The paper introduces HST-HGN, which builds a heterogeneous hypergraph to fuse pose-disentangled geometries with multi-modal texture patches at the spatial level, then runs a Bi-Mamba module for bidirectional temporal filtering. This targets driver fatigue in untrimmed videos where subtle expressions span long periods and compute is limited. The spatial design aims to capture high-order synergies that pairwise graphs miss, while the temporal part keeps complexity linear to handle complete action lifecycles like yawning versus speaking.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes HST-HGN, a Heterogeneous Spatial-Temporal Hypergraph Network with Bidirectional State Space Models (Bi-Mamba) for driver fatigue assessment from untrimmed videos. Spatially, a hierarchical hypergraph fuses pose-disentangled geometric topologies with multi-modal texture patches to capture high-order facial synergies; temporally, Bi-Mamba provides linear-complexity bidirectional filtering to model complete physiological lifecycles and distinguish ambiguous actions such as yawning versus speaking. The central claim is that this architecture achieves state-of-the-art performance while balancing discriminative power and efficiency, making it suitable for real-time in-cabin edge deployment.

Significance. If the experimental claims are substantiated, the work could meaningfully advance real-time fatigue monitoring by offering a more expressive spatial model than pairwise graphs and a more efficient temporal model than transformers, with potential impact on automotive safety systems that require both accuracy on subtle cues and low latency on edge hardware.

major comments (2)

[Abstract] Abstract: The assertions of state-of-the-art performance across diverse fatigue benchmarks and suitability for real-time edge deployment rest entirely on unshown quantitative results; no accuracy metrics, baseline comparisons, ablation tables, error bars, or dataset details are supplied, rendering the central empirical claim unverifiable from the manuscript.
[Abstract] Abstract and §4 (presumed experiments): No end-to-end latency, FPS, or memory measurements on representative edge hardware (e.g., Jetson) are reported, nor are ablations that isolate the contribution of the hierarchical hypergraph versus Bi-Mamba; without these, the efficiency half of the claim cannot be evaluated against prior graph or transformer baselines.

minor comments (1)

[Abstract] Abstract: The phrase 'global fatigue assessment' is introduced without a precise operational definition distinguishing it from local action classification, which could be clarified for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each point below and will revise the manuscript to improve the accessibility and completeness of the empirical claims.

read point-by-point responses

Referee: [Abstract] Abstract: The assertions of state-of-the-art performance across diverse fatigue benchmarks and suitability for real-time edge deployment rest entirely on unshown quantitative results; no accuracy metrics, baseline comparisons, ablation tables, error bars, or dataset details are supplied, rendering the central empirical claim unverifiable from the manuscript.

Authors: The full manuscript presents all requested quantitative details in Section 4, including accuracy metrics, baseline comparisons, ablation tables with error bars, and dataset specifications across multiple fatigue benchmarks. The abstract summarizes these findings at a high level due to length constraints. To make the central claims directly verifiable, we will revise the abstract to include key numerical results such as the top accuracy scores and efficiency gains. revision: yes
Referee: [Abstract] Abstract and §4 (presumed experiments): No end-to-end latency, FPS, or memory measurements on representative edge hardware (e.g., Jetson) are reported, nor are ablations that isolate the contribution of the hierarchical hypergraph versus Bi-Mamba; without these, the efficiency half of the claim cannot be evaluated against prior graph or transformer baselines.

Authors: Section 4 of the manuscript reports computational complexity and efficiency metrics along with initial ablation studies. We agree that explicit edge-device measurements and isolated component ablations are important for substantiating the efficiency claims. We will revise the manuscript to add end-to-end latency, FPS, and memory results on Jetson hardware and expand the ablations to separately quantify the hierarchical hypergraph and Bi-Mamba contributions relative to graph and transformer baselines. revision: yes

Circularity Check

0 steps flagged

No circularity: architecture proposal validated by external benchmarks

full rationale

The paper proposes HST-HGN as a novel combination of hierarchical hypergraph fusion (pose-disentangled geometries + multi-modal texture patches) and Bi-Mamba bidirectional sequence modeling for fatigue assessment in untrimmed videos. All central claims of SOTA performance, discriminative power for ambiguous actions, and real-time edge suitability are grounded in extensive evaluations on diverse fatigue benchmarks rather than any internal derivation that reduces to fitted inputs or self-referential definitions. No equations, self-citations as uniqueness theorems, or ansatzes are presented that would create circularity; the model design is justified by its stated capacity to capture high-order synergies and linear-complexity temporal context, with empirical results providing independent external support.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on standard deep learning assumptions plus domain-specific premises about facial data; no explicit free parameters or invented entities are detailed.

free parameters (1)

model hyperparameters
Typical in neural network training (layer counts, learning rates, etc.) but unspecified in abstract.

axioms (1)

domain assumption Facial expressions in untrimmed videos contain sufficient high-order synergistic information to assess global fatigue states.
Invoked in the motivation and spatial modeling description.

pith-pipeline@v0.9.0 · 5503 in / 1111 out tokens · 38085 ms · 2026-05-10T17:15:12.450773+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

59 extracted references · 9 canonical work pages · 5 internal anchors

[1]

El-Rabaie, Khalil F

Samy Abd El-Nabi, Walid El-Shafai, El-Sayed M. El-Rabaie, Khalil F. Ramadan, Fathi E. Abd El-Samie, and Saeed Mohsen. 2024. Machine learning and deep learn- ing techniques for driver fatigue and drowsiness detection: a review.Multimedia Tools and Applications83, 3 (2024), 9441–9477

2024
[2]

Shabnam Abtahi, Mona Omidyeganeh, Shervin Shirmohammadi, and Behnoosh Hariri. 2020. YawDD: Yawning Detection Dataset

2020
[3]

Safwan Mahmood Al-Selwi, Mohd Fadzil Hassan, Said Jadid Abdulkadir, Amgad Muneer, et al. 2023. LSTM inefficiency in long-term dependencies regression problems.Journal of Advanced Research in Applied Sciences and Engineering Technology30, 3 (2023), 16–31

2023
[4]

Xiaopeng An, Lu Su, Qi Yang, Bo Shen, Linhua Gan, Jia jun Ji, Jian Wang, and Haifeng Su. 2025. A spatiotemporal hypergraph self-attention neural networks framework for the identification and pharmacological efficacy assessment of Parkinson’s disease motor symptoms.NPJ Parkinson’s Disease11 (2025)

2025
[5]

Regan, Hongbo Jiang, and Licheng Jiao

Jing Bai, Wentao Yu, Zhu Xiao, Vincent Havyarimana, Amelia C. Regan, Hongbo Jiang, and Licheng Jiao. 2022. Two-Stream Spatial–Temporal Graph Convolu- tional Networks for Driver Drowsiness Detection.IEEE Transactions on Cyber- netics52, 12 (2022), 13821–13833

2022
[6]

Gedas Bertasius, Heng Wang, and Lorenzo Torresani. 2021. Is Space-Time Attention All You Need for Video Understanding?. InProceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 813–824

2021
[7]

Shengli Cao, Peihua Feng, Wei Kang, Zeyi Chen, and Bo Wang. 2025. Optimized driver fatigue detection method using multimodal neural networks.Scientific Reports15, 1 (2025), 12240

2025
[8]

Joao Carreira and Andrew Zisserman. 2017. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4724–4733

2017
[9]

Shuxiang Fa, Xiaohui Yang, Shiyuan Han, Zhiquan Feng, and Yuehui Chen. 2023. Multi-scale spatial–temporal attention graph convolutional networks for driver fatigue detection.Journal of Visual Communication and Image Representation93 (2023), 103826

2023
[10]

Zunguan Fan, Yifan Feng, Kang Wang, and Xiaoli Li. 2024. Multi-Modal Temporal Hypergraph Neural Network for Flotation Condition Recognition.Entropy26, 3 (2024)

2024
[11]

Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. 2019. Slow- Fast Networks for Video Recognition. In2019 IEEE/CVF International Conference on Computer Vision (ICCV). 6201–6210

2019
[12]

Yifan Feng, Haoxuan You, Zizhao Zhang, Rongrong Ji, and Yue Gao. 2019. Hy- pergraph Neural Networks.Proceedings of the AAAI Conference on Artificial Intelligence33, 01 (Jul. 2019), 3558–3565

2019
[13]

Biying Fu, Fadi Boutros, Chin-Teng Lin, and Naser Damer. 2024. A Survey on Drowsiness Detection: Modern Applications and Methods.IEEE Transactions on Intelligent Vehicles9, 11 (2024), 7279–7300

2024
[14]

Reza Ghoddoosian, Marnim Galib, and Vassilis Athitsos. 2019. A Realis- tic Dataset and Baseline Temporal Model for Early Drowsiness Detection. arXiv:1904.07312 [cs.CV]

work page arXiv 2019
[15]

Albert Gu and Tri Dao. 2024. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv:2312.00752 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2024
[16]

Albert Gu, Karan Goel, and Christopher R’e. 2021. Efficiently Modeling Long Sequences with Structured State Spaces.ArXivabs/2111.00396 (2021)

work page internal anchor Pith review arXiv 2021
[17]

Qing Han, Shimiao Cui, Weidong Min, Cong Yan, Li Liu, Feng Ning, and Li Li
[18]

Scientific Reports15, 1 (2025), 15518

A dense multi-pooling convolutional network for driving fatigue detection. Scientific Reports15, 1 (2025), 15518

2025
[19]

Hassan, Ahmed F

Osama F. Hassan, Ahmed F. Ibrahim, Ahmed Gomaa, M. A. Makhlouf, and Bassel Hafiz. 2025. Real-time driver drowsiness detection using transformer architectures: a novel deep learning approach.Scientific Reports15, 1 (2025), 17493

2025
[20]

Rui Huang, Yan Wang, Zijian Li, Zeyu Lei, and Yufan Xu. 2022. RF-DCM: Multi- Granularity Deep Convolutional Model Based on Feature Recalibration and Fusion for Driver Fatigue Detection.IEEE Transactions on Intelligent Transporta- tion Systems23, 1 (2022), 630–640

2022
[21]

Md Mohaiminul Islam and Gedas Bertasius. 2022. Long Movie Clip Classification with State-Space Video Models. Springer-Verlag, Berlin, Heidelberg, 87–104

2022
[22]

Fan Jiang, Qionghao Huang, Xiaoyong Mei, Quanlong Guan, Yaxin Tu, Weiqi Luo, and Changqin Huang. 2023. Face2Nodes: Learning facial expression represen- tations with relation-aware dynamic graph convolution networks.Information Sciences649 (2023), 119640

2023
[23]

Davis E. King. 2009. Dlib-ml: A Machine Learning Toolkit.J. Mach. Learn. Res. 10 (Dec. 2009), 1755–1758

2009
[24]

Kunchang Li, Xinhao Li, Yi Wang, Yinan He, Yali Wang, Limin Wang, and Yu Qiao. 2024. VideoMamba: State Space Model for Efficient Video Understanding. InEuropean Conference on Computer Vision. Springer, 237–255

2024
[25]

Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qixiang Ye, Jianbin Jiao, and Yunfan Liu. 2024. VMamba: Visual State Space Model. InAdvances in Neural Information Processing Systems, A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (Eds.), Vol. 37. Curran Associates, Inc., 103031–103063

2024
[26]

Yansha Lu, Chunsheng Liu, Faliang Chang, Hui Liu, and Hengqiang Huan. 2023. JHPFA-Net: Joint Head Pose and Facial Action Network for Driver Yawning Detection Across Arbitrary Poses in Videos.IEEE Transactions on Intelligent Transportation Systems24, 11 (2023), 11850–11863

2023
[27]

Camillo Lugaresi, Jiuqiang Tang, Hadon Nash, Chris McClanahan, Esha Uboweja, Michael Hays, Fan Zhang, Chuo-Ling Chang, Ming Guang Yong, Juhyun Lee, Wan-Teh Chang, Wei Hua, Manfred Georg, and Matthias Grundmann. 2019. Medi- aPipe: A Framework for Building Perception Pipelines. arXiv:1906.08172 [cs.DC]

work page internal anchor Pith review arXiv 2019
[28]

Najarian

Cong Ma and K. Najarian. 2025. Rethinking the long-range dependency in Mamba/SSM and transformer models.ArXivabs/2509.04226 (2025)

work page arXiv 2025
[29]

Abid Ali Minhas, Sohail Jabbar, Muhammad Farhan, and Muhammad Najam ul Islam. 2022. A smart analysis of driver fatigue and drowsiness detection using convolutional neural networks.Multimedia Tools and Applications81, 19 (2022), 26969–26986

2022
[30]

Luntian Mou, Chao Zhou, Pengtao Xie, Pengfei Zhao, Ramesh Jain, Wen Gao, and Baocai Yin. 2023. Isotropic Self-Supervised Learning for Driver Drowsi- ness Detection With Attention-Based Multimodal Fusion.IEEE Transactions on Multimedia25 (2023), 529–542

2023
[31]

Juan Diego Ortega, Neslihan Kose, Paola Cañas, Min-An Chao, Alexander Un- nervik, Marcos Nieto, Oihana Otaegui, and Luis Salgado. 2020. DMD: A Large- Scale Multi-modal Driver Monitoring Dataset for Attention and Alertness Anal- ysis. InComputer Vision – ECCV 2020 Workshops, Adrien Bartoli and Andrea Fusiello (Eds.). Springer International Publishing, Cham...

2020
[32]

Jing Ren, Suyu Ma, Hong Jia, Xiwei Xu, Ivan Lee, Haytham Fayek, Xiaodong Li, and Feng Xia. 2025. LiteFat: Lightweight Spatio-Temporal Graph Learning for Real-Time Driver Fatigue Detection. In2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 8059–8066

2025
[33]

Shaibal Saha and Lanyu Xu. 2025. Vision transformers on the edge: A comprehen- sive survey of model compression and acceleration strategies.Neurocomputing 643 (2025), 130417

2025
[34]

Gulbadan Sikander and Shahzad Anwar. 2019. Driver Fatigue Detection Systems: A Review.IEEE Transactions on Intelligent Transportation Systems20, 6 (2019), 2339–2352

2019
[35]

Shriyank Somvanshi, Md Monzurul Islam, Mahmuda Sultana Mimi, Saz- zad Bin Bashar Polock, Gaurab Chhetri, and Subasish Das. 2025. From S4 to Mamba: A Comprehensive Survey on Structured State Space Models. arXiv:2503.18970 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2025
[36]

Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jiany- ong Wang, and Furu Wei. 2023. Retentive Network: A Successor to Transformer for Large Language Models. arXiv:2307.08621 [cs.CL]

work page internal anchor Pith review arXiv 2023
[37]

Zhichao Sun, Yinan Miao, Jun Young Jeon, Yeseul Kong, and Gyuhae Park. 2023. Facial feature fusion convolutional neural network for driver fatigue detection. Engineering Applications of Artificial Intelligence126 (2023), 106981

2023
[38]

Yujin Tang, Peijie Dong, Zhenheng Tang, Xiaowen Chu, and Junwei Liang
[39]

VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting.2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)(2024), 5663–5673

2024
[40]

Zhan Tong, Yibing Song, Jue Wang, and Limin Wang. 2022. VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training. InAdvances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 10078–10093

2022
[41]

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri
[42]

In Proceedings of the IEEE International Conference on Computer Vision

Learning Spatiotemporal Features with 3D Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision. 4489–4497
[43]

Jue Wang, Wenjie Zhu, Pichao Wang, Xiang Yu, Linda Liu, Mohamed Omar, and Raffay Hamid. 2023. Selective Structured State-Spaces for Long-Form Video Understanding.2023 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR)(2023), 6387–6397

2023
[44]

Weizheng Wang, Le Mao, Baijian Yang, Guohua Chen, and Byung-Cheol Min
[45]

Hyper-STTN: Social Group-aware Spatial-Temporal Transformer Net- work for Human Trajectory Prediction with Hypergraph Reasoning.ArXiv abs/2401.06344 (2024)

work page arXiv 2024
[46]

Yi Wang, Haoran Luo, Luyang Meng, and Yuying Fan. 2026. MST-HGCN: A multimodal spatio-temporal hypergraph convolutional network for infantile spasms detection.Journal of King Saud University Computer and Information Sciences(2026)

2026
[47]

Wijnands, Jason Thompson, Gideon D

Jasper S. Wijnands, Jason Thompson, Gideon D. A. Aschwanden, and Mark Stevenson. 2020. Real-time monitoring of driver drowsiness on mobile platforms using 3D neural networks.Neural Computing and Applications32, 13 (2020), 9731–9743

2020
[48]

Chao-Yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp Krähen- bühl, and Ross Girshick. 2019. Long-Term Feature Banks for Detailed Video Understanding. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 284–293

2019
[49]

Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. 2021. Autoformer: De- composition Transformers with Auto-Correlation for Long-Term Series Forecast- ing. InAdvances in Neural Information Processing Systems, M. Ranzato, A. Beygelz- imer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 22419–22430

2021
[50]

Zhize Wu, Yue Ding, Long Wan, Teng Li, and Fudong Nian. 2025. Local and global self-attention enhanced graph convolutional network for skeleton-based action recognition.Pattern Recognition159 (2025), 111106

2025
[51]

Cuiliu Yang and Zhao Pei. 2023. Long-Short Term Spatio-Temporal Aggregation for Trajectory Prediction.IEEE Transactions on Intelligent Transportation Systems 24, 4 (2023), 4114–4126

2023
[52]

Cong Yang, Zhenyu Yang, Weiyu Li, and John See. 2023. FatigueView: A Multi- Camera Video Dataset for Vision-based Drowsiness Detection.IEEE Transactions on Intelligent Transportation Systems24, 1 (2023), 233–246

2023
[53]

Lie Yang, Haohan Yang, Henglai Wei, Zhongxu Hu, and Chen Lv. 2024. Video- Based Driver Drowsiness Detection With Optimised Utilization of Key Facial Features.IEEE Transactions on Intelligent Transportation Systems25, 7 (2024), 6938–6950

2024
[54]

Zhimin Zhang, Hongmei Wang, Qian You, Liming Chen, and Huansheng Ning
[55]

A novel temporal adaptive fuzzy neural network for facial feature based fatigue assessment.Expert Systems with Applications252 (2024), 124124

2024
[56]

Xia Zhao, Limin Wang, Yufei Zhang, Xuming Han, Muhammet Deveci, and Milan Parmar. 2024. A review of convolutional neural networks in computer vision. Artificial Intelligence Review57 (2024), 99

2024
[57]

Zuopeng Zhao, Nana Zhou, Lan Zhang, Hualin Yan, Yi Xu, and Zhongxin Zhang
[58]

arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1155/2020/7251280

Driver Fatigue Detection Based on Convolutional Neural Networks Using EM-CNN.Computational Intelligence and Neuroscience2020, 1 (2020), 7251280. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1155/2020/7251280

work page doi:10.1155/2020/7251280 2020
[59]

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. 2021. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11106–11115

2021