pith. sign in

arxiv: 2605.04752 · v1 · submitted 2026-05-06 · 💻 cs.CV · cs.AI

Hybrid Congestion Classification Framework Using Flow-Guided Attention and Empirical Mode Decomposition

Pith reviewed 2026-05-08 17:04 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords traffic congestionoptical flowempirical mode decompositionattention mechanismvideo classificationspatiotemporal modelinghybrid frameworkmotion analysis
0
0 comments X

The pith

Flow-guided attention and empirical mode decomposition together classify traffic congestion levels more effectively by integrating spatial motion cues with adaptive temporal analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper seeks to establish that a unified video analysis framework can overcome the separate weaknesses of appearance-based and signal-based methods for detecting road congestion. By using optical flow to direct attention to moving parts of the scene and applying empirical mode decomposition to break down motion patterns into intrinsic components, the approach captures both location and timing of traffic flow. A sympathetic reader would care because reliable congestion classification from existing camera feeds could support better traffic control systems and reduce reliance on physical sensors. The reported results indicate that this combination leads to high accuracy and stability under different conditions.

Core claim

The authors claim that their FLO-EMD model, which applies dense optical flow to guide attention in refining RGB features for motion-relevant regions and uses empirical mode decomposition on aggregated flow statistics to obtain intrinsic temporal components, when fused with spatiotemporal representations, enables effective classification of light, medium, and heavy congestion.

What carries the argument

The hybrid FLO-EMD architecture that links motion evidence from optical flow to spatial feature selection through attention and performs data-adaptive temporal characterization via empirical mode decomposition.

If this is right

  • The combined model reaches 97.5% overall test accuracy and a weighted F1 of 0.9742 on the 1,050 clips.
  • It outperforms several established baseline methods.
  • Performance stays robust across the varied conditions in the four surveillance networks.
  • Ablation experiments show the specific contributions of the EMD step, the number of intrinsic mode functions, and the motion descriptors used.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This approach could be tested for extension to predicting congestion evolution over longer time periods rather than classifying current state.
  • The method's emphasis on motion might make it suitable for low-light or poor visibility scenarios where color cues fail.
  • Integrating such a system with existing traffic management software could provide granular level information for dynamic signal timing.

Load-bearing premise

The selected video clips and motion features sufficiently capture the essential variations in traffic behavior without introducing selection bias or overfitting through post-hoc choices.

What would settle it

Evaluating the trained model on a new collection of traffic video clips from additional locations or different times that results in substantially reduced classification accuracy would disprove the claim of robust high performance.

Figures

Figures reproduced from arXiv: 2605.04752 by Armstrong Aboah, Blessing Agyei Kyem, Eugene Kofi Okrah Denteh, Joshua Kofi Asamoah.

Figure 1
Figure 1. Figure 1: Overview of the FLO-EMD framework. The architecture processes traffic video sequences through parallel RGB and optical flow backbones, employs flow-guided attention mechanisms for spatial feature enhancement, integrates EMD-based temporal analysis of motion statistics, and fuses multimodal features through bidirectional LSTM encoding for final classification. The rest of this section details the implementa… view at source ↗
Figure 2
Figure 2. Figure 2: RGB backbone: hierarchical convolutions (64, 128, 256, 512 channels) with decreasing spatial resolution, followed by global average pooling to obtain a fixed￾dimensional frame representation. Optical–flow backbone: This mirrors the RGB backbone design but operates on dense flow fields of shape 𝐵 × (𝑇−1) × 𝐻 × 𝑊 × 2. The first convolution adapts to the two-channel input (𝑢, 𝑣) and is followed by the same se… view at source ↗
Figure 3
Figure 3. Figure 3: Flow-guided channel attention mechanism. RGB and optical flow features undergo global average and maximum pooling, followed by processing through shared MLPs to generate channel attention weights that emphasize motion-relevant feature channels for traffic analysis. 10 Eugene Denteh May 7, 2026 view at source ↗
Figure 4
Figure 4. Figure 4: Flow-guided spatial attention mechanism. Channel-refined RGB features are combined with flow magnitude information through channel-wise pooling operations. The concatenated spatial maps undergo convolution and sigmoid activation to produce spatial attention weights that focus on motion-active regions while suppressing static background elements. Temporal Modeling and Feature Fusion Temporal modeling integr… view at source ↗
Figure 5
Figure 5. Figure 5: Dataset diversity showing various traffic conditions and environmental scenar￾ios: (a) light traffic flow on a multi-lane highway under clear daytime conditions, (b) moderate traffic density with well-spaced vehicles during clear weather, (c) increased traffic density showing more congested but still flowing conditions, (d) highway infras￾tructure with moderate traffic under overcast conditions, (e) modera… view at source ↗
Figure 6
Figure 6. Figure 6: Accurate classification examples showing the model’s ability to correctly identify light, medium, and heavy traffic conditions with high confidence scores ranging from 97.40% to 99.20%. The attention heatmaps demonstrate consistent focus on traffic￾relevant regions across different congestion levels. light medium heavy Predicted Label light medium heavy True Label 0.99 0.01 0.00 0.01 0.94 0.05 0.00 0.05 0.… view at source ↗
Figure 7
Figure 7. Figure 7: Confusion matrix for the proposed model showing classification performance across all traffic congestion classes. The matrix reveals strong diagonal performance with minimal misclassification between adjacent congestion levels view at source ↗
Figure 8
Figure 8. Figure 8: Representative misclassification examples showing boundary cases where the model struggles with ambiguous traffic scenarios. The moderate confidence scores (65-81%) indicate uncertainty in these challenging transition cases between medium and heavy congestion. area. This suggests that the attention mechanism can be influenced by motion-adjacent cues and optical-flow artifacts near boundaries, as well as hi… view at source ↗
Figure 9
Figure 9. Figure 9: Temporal attention visualization analysis across 16 consecutive frames showing attention pattern evolution over time. Top: FLO-EMD with flow-guided attention demonstrating consistent tracking of vehicle locations and adaptive focus on traffic￾active regions throughout the sequence. Bottom: FLO-EMD V2 without flow-guided attention exhibiting static attention patterns concentrated on road infrastructure elem… view at source ↗
read the original abstract

Accurate traffic congestion classification requires models that jointly capture roadway scene context and non-stationary traffic motion, yet most prior work treats these requirements in isolation. Vision-based methods often depend on appearance cues with standard temporal pooling, which can bias predictions toward static infrastructure, whereas signal-based approaches characterize temporal dynamics but lack the spatial context needed for scene-level localization. These complementary limitations motivate a unified framework that links motion evidence to spatial feature selection while preserving data-adaptive temporal characterization. This study therefore proposes FLO-EMD, a hybrid approach that couples motion-guided attention with empirical, data-driven temporal decomposition. Dense optical flow guides channel and spatial attention so that RGB features are refined toward motion-relevant regions. In parallel, aggregated flow statistics form compact motion traces that are decomposed using Empirical Mode Decomposition (EMD) to extract intrinsic temporal components. The resulting EMD embedding is fused with learned spatiotemporal representations to classify light, medium, and heavy congestion. Experiments on 1,050 five-second clips from four surveillance networks show that FLO-EMD achieves 97.5% overall test accuracy (weighted F1 = 0.9742), outperforming established baselines and remaining robust across diverse environmental conditions; ablation and sensitivity analyses further quantify the contributions of EMD, the number of intrinsic mode functions, and the selected motion descriptors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes FLO-EMD, a hybrid congestion classification framework that couples dense optical flow-guided channel and spatial attention on RGB features with Empirical Mode Decomposition (EMD) applied to aggregated flow statistics for extracting intrinsic temporal components. These are fused to classify light, medium, and heavy congestion. On a dataset of 1,050 five-second clips from four surveillance networks, the method reports 97.5% overall test accuracy (weighted F1 = 0.9742), outperforming baselines, with supporting ablation and sensitivity analyses on EMD components, number of intrinsic mode functions, and motion descriptors.

Significance. If the reported accuracy and robustness hold under properly controlled validation, the work offers a concrete advance in hybrid vision-signal methods for non-stationary scene understanding, addressing the complementary weaknesses of pure appearance-based and pure signal-based congestion classifiers. The explicit use of data-adaptive EMD on motion traces and the reported ablation quantifications of component contributions are strengths that could inform follow-on work in traffic monitoring and related dynamic classification tasks.

major comments (2)
  1. [Experiments / abstract] The experimental evaluation (abstract and §4/§5) does not specify whether the train/test split on the 1,050 clips is network-disjoint or camera-disjoint. With only four source networks, any non-disjoint split risks the model exploiting network-specific lighting, camera geometry, or background statistics that correlate with congestion labels, rather than learning general motion dynamics; this directly undermines the central 97.5% accuracy claim and the assertion of robustness across diverse conditions.
  2. [Ablation and sensitivity analyses] The sensitivity analysis on the number of intrinsic mode functions (a free hyperparameter listed in the axiom ledger) is mentioned but lacks details on how the value was selected without post-hoc optimization on the test distribution; if the chosen number was tuned after seeing test performance, the ablation results quantifying EMD contribution become circular and no longer support the headline performance numbers.
minor comments (2)
  1. [Experiments] Baseline implementations are referenced but lack explicit details on data splits, hyperparameter search, or statistical testing (e.g., multiple runs with standard deviations), making it difficult to assess whether the reported outperformance is robust.
  2. [Method] Notation for the fused embedding and attention modules could be clarified with an explicit diagram or equation showing how the EMD embedding is concatenated or attended with the spatiotemporal features.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to incorporate clarifications and additional details as outlined.

read point-by-point responses
  1. Referee: [Experiments / abstract] The experimental evaluation (abstract and §4/§5) does not specify whether the train/test split on the 1,050 clips is network-disjoint or camera-disjoint. With only four source networks, any non-disjoint split risks the model exploiting network-specific lighting, camera geometry, or background statistics that correlate with congestion labels, rather than learning general motion dynamics; this directly undermines the central 97.5% accuracy claim and the assertion of robustness across diverse conditions.

    Authors: We acknowledge that the manuscript does not explicitly describe the train/test split procedure. The 1,050 clips were randomly partitioned into training, validation, and test sets (70/15/15 ratio) at the clip level without enforcing network-disjoint or camera-disjoint splits, as this was necessary to maintain class balance and adequate sample sizes given only four source networks. We agree this introduces a risk of the model capturing network-specific cues rather than purely general motion dynamics. In the revised manuscript, we will clearly state the split method, add a dedicated limitations discussion on this point, and include leave-one-network-out cross-validation results to quantify generalization across networks. revision: yes

  2. Referee: [Ablation and sensitivity analyses] The sensitivity analysis on the number of intrinsic mode functions (a free hyperparameter listed in the axiom ledger) is mentioned but lacks details on how the value was selected without post-hoc optimization on the test distribution; if the chosen number was tuned after seeing test performance, the ablation results quantifying EMD contribution become circular and no longer support the headline performance numbers.

    Authors: We agree that additional transparency is needed on the selection of the number of intrinsic mode functions. This hyperparameter was determined exclusively via grid search (values 1–10) on the training set using 5-fold cross-validation, selecting the value that maximized average validation accuracy; the test set was never used. We will revise the sensitivity analysis section to document this procedure in full, including the validation performance for each candidate number of IMFs, and explicitly confirm that no test data influenced the choice. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

This is an empirical ML paper proposing a hybrid FLO-EMD architecture that fuses flow-guided attention with EMD-based temporal decomposition, then trains and evaluates a classifier on held-out video clips. No equations, uniqueness theorems, or self-citations are invoked to derive performance metrics or architectural choices by construction; accuracy and ablation results are obtained via standard supervised training on a fixed dataset split rather than any reduction of outputs to fitted inputs or prior self-referential claims.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The framework rests on standard computer-vision and signal-processing assumptions plus a modest number of design choices whose impact is quantified only via ablation on the authors' own data.

free parameters (1)
  • number of intrinsic mode functions
    Hyperparameter controlling the EMD embedding dimensionality; its effect is studied via sensitivity analysis but remains a tunable choice.
axioms (2)
  • domain assumption Dense optical flow reliably identifies motion regions relevant to congestion level
    Used to guide both channel and spatial attention without independent validation that flow errors do not systematically bias attention maps.
  • domain assumption Aggregated flow statistics contain non-stationary temporal structure that EMD can meaningfully decompose for classification
    Core premise of the temporal branch; no proof or external benchmark is supplied.

pith-pipeline@v0.9.0 · 5548 in / 1509 out tokens · 21435 ms · 2026-05-08T17:04:43.034582+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

113 extracted references · 113 canonical work pages

  1. [1]

    2015 IEEE 18th International Conference on Intelligent Transportation Systems , pages=

    Reliability of probe speed data for detecting congestion trends , author=. 2015 IEEE 18th International Conference on Intelligent Transportation Systems , pages=. 2015 , organization=

  2. [2]

    Multimedia systems , volume=

    Video-based driver action recognition via hybrid spatial--temporal deep learning framework , author=. Multimedia systems , volume=. 2021 , publisher=

  3. [3]

    Journal of Intelligent Transportation Systems , volume=

    Convolutional neural network for recognizing highway traffic congestion , author=. Journal of Intelligent Transportation Systems , volume=. 2020 , publisher=

  4. [4]

    Transportation Research Record , volume=

    Traffic congestion detection from camera images using deep convolution neural networks , author=. Transportation Research Record , volume=. 2018 , publisher=

  5. [5]

    Journal of advanced transportation , volume=

    A deep learning based traffic state estimation method for mixed traffic flow environment , author=. Journal of advanced transportation , volume=. 2022 , publisher=

  6. [6]

    2011 14th international IEEE conference on intelligent transportation systems (ITSC) , pages=

    Video processing techniques for traffic flow monitoring: A survey , author=. 2011 14th international IEEE conference on intelligent transportation systems (ITSC) , pages=. 2011 , organization=

  7. [7]

    IEEE Transactions on intelligent transportation systems , volume=

    A review of computer vision techniques for the analysis of urban traffic , author=. IEEE Transactions on intelligent transportation systems , volume=. 2011 , publisher=

  8. [8]

    Proceedings of the AAAI conference on artificial intelligence , volume=

    Deep spatio-temporal residual networks for citywide crowd flows prediction , author=. Proceedings of the AAAI conference on artificial intelligence , volume=. 2017 , organization=

  9. [9]

    IAES International Journal of Artificial Intelligence , volume=

    Adaptive real time traffic prediction using deep neural networks , author=. IAES International Journal of Artificial Intelligence , volume=. 2019 , publisher=

  10. [10]

    2022 , school=

    Traffic congestion detection and optimizing traffic flow using object detection, optical flow and fluid dynamics , author=. 2022 , school=

  11. [11]

    2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=

    Optical-flow features empirical mode decomposition for motion anomaly detection , author=. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=. 2017 , organization=

  12. [12]

    Journal of Built Environment, Technology and Engineering , volume=

    Deterministic algorithm for traffic detection in free-flow and congestion using video sensor , author=. Journal of Built Environment, Technology and Engineering , volume=

  13. [13]

    Journal of industrial information integration , volume=

    Anomaly detection in NetFlow network traffic using supervised machine learning algorithms , author=. Journal of industrial information integration , volume=. 2023 , publisher=

  14. [14]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Learning memory-guided normality for anomaly detection , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  15. [15]

    IEEE transactions on signal processing , volume=

    Variational mode decomposition , author=. IEEE transactions on signal processing , volume=. 2013 , publisher=

  16. [16]

    Electronics , volume=

    A complex empirical mode decomposition for multivariant traffic time series , author=. Electronics , volume=. 2023 , publisher=

  17. [17]

    Mechanical Systems and Signal Processing , volume=

    Enhancement of adaptive mode decomposition via angular resampling for nonstationary signal analysis of rotating machinery: Principle and applications , author=. Mechanical Systems and Signal Processing , volume=. 2021 , publisher=

  18. [18]

    Proceedings of the AAAI conference on artificial intelligence , volume=

    Spatial temporal graph convolutional networks for skeleton-based action recognition , author=. Proceedings of the AAAI conference on artificial intelligence , volume=. 2018 , organization=

  19. [19]

    The Journal of Supercomputing , volume=

    Spatial-temporal graph convolutional networks for traffic flow prediction considering multiple traffic parameters , author=. The Journal of Supercomputing , volume=. 2023 , publisher=

  20. [20]

    IEEE Transactions on Knowledge and Data Engineering , volume=

    Spatio-temporal joint graph convolutional networks for traffic forecasting , author=. IEEE Transactions on Knowledge and Data Engineering , volume=. 2023 , publisher=

  21. [21]

    Alexandria Engineering Journal , volume=

    A combined method for short-term traffic flow prediction based on recurrent neural network , author=. Alexandria Engineering Journal , volume=. 2021 , publisher=

  22. [22]

    Pattern Recognition , volume=

    A decomposition dynamic graph convolutional recurrent network for traffic forecasting , author=. Pattern Recognition , volume=. 2023 , publisher=

  23. [23]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    The 5th AI City Challenge , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=. 2021 , organization=

  24. [24]

    International Transportation Economic Development Conference

    Congestion evaluation best practices , author=. International Transportation Economic Development Conference. Sheraton Dallas Hotel, Dallas, USA , pages=. 2014 , organization=

  25. [25]

    Texas: Texas Transportation Institute , year=

    Urban mobility report texas transportation institute , author=. Texas: Texas Transportation Institute , year=

  26. [26]

    Transportation research record , volume=

    Real-world carbon dioxide impacts of traffic congestion , author=. Transportation research record , volume=. 2008 , publisher=

  27. [27]

    Clean Air Journal , volume=

    Ambient air pollution: A global assessment of exposure and burden of disease , author=. Clean Air Journal , volume=

  28. [28]

    Science of the total environment , volume=

    Quantifying on-road vehicle emissions during traffic congestion using updated emission factors of light-duty gasoline vehicles and real-world traffic monitoring big data , author=. Science of the total environment , volume=. 2022 , publisher=

  29. [29]

    Transportation Research Part C: Emerging Technologies , volume=

    On feature selection for traffic congestion prediction , author=. Transportation Research Part C: Emerging Technologies , volume=. 2013 , publisher=

  30. [30]

    Middle-East Journal of Scientific Research , volume=

    A survey on intelligent transportation systems , author=. Middle-East Journal of Scientific Research , volume=

  31. [31]

    Transportation Research Part C: Emerging Technologies , volume=

    A real-time computer vision system for vehicle tracking and traffic surveillance , author=. Transportation Research Part C: Emerging Technologies , volume=. 1998 , publisher=

  32. [32]

    Journal of Intelligent Transportation Systems , volume=

    Connected and automated vehicle systems: Introduction and overview , author=. Journal of Intelligent Transportation Systems , volume=. 2018 , publisher=

  33. [33]

    2018 3rd International conference on computational systems and information technology for sustainable solutions (CSITSS) , pages=

    A review on video based vehicle detection, recognition and tracking , author=. 2018 3rd International conference on computational systems and information technology for sustainable solutions (CSITSS) , pages=. 2018 , organization=

  34. [34]

    International Journal of Signal Processing, Image Processing and Pattern Recognition , volume=

    Moving object tracking of vehicle detection: A concise review , author=. International Journal of Signal Processing, Image Processing and Pattern Recognition , volume=

  35. [35]

    IEEE Transactions on Intelligent Transportation Systems , volume=

    Deep learning on traffic prediction: Methods, analysis, and future directions , author=. IEEE Transactions on Intelligent Transportation Systems , volume=. 2021 , publisher=

  36. [36]

    IEEE Transactions on Intelligent Transportation Systems , volume=

    A hybrid deep learning model with attention-based conv-LSTM networks for short-term traffic flow prediction , author=. IEEE Transactions on Intelligent Transportation Systems , volume=. 2020 , publisher=

  37. [37]

    Comprehensive Survey and Analysis of Techniques, Advancements, and Challenges in Video-Based Traffic Surveillance Systems , author=. Int. J. Recent Innov. Trends Comput. Commun , volume=

  38. [38]

    2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA) , pages=

    Dynamic traffic system based on real time detection of traffic congestion , author=. 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA) , pages=. 2018 , organization=

  39. [39]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    Motion guided attention for video salient object detection , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=. 2019 , organization=

  40. [40]

    IEEE Access , volume=

    Adaptive signal processing algorithms based on EMD and ITD , author=. IEEE Access , volume=. 2019 , publisher=

  41. [41]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    Tam: Temporal adaptive module for video recognition , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=. 2021 , organization=

  42. [42]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Tea: Temporal excitation and aggregation for action recognition , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=. 2020 , organization=

  43. [43]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    X3d: Expanding architectures for efficient video recognition , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=. 2020 , organization=

  44. [44]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Mvitv2: Improved multiscale vision transformers for classification and detection , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=. 2022 , organization=

  45. [45]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Video swin transformer , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=. 2022 , organization=

  46. [46]

    Icml , volume=

    Is space-time attention all you need for video understanding? , author=. Icml , volume=. 2021 , organization=

  47. [47]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Understanding traffic density from large-scale web camera data , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=. 2017 , organization=

  48. [48]

    2019 IEEE Pune section international conference (PuneCon) , pages=

    Hog, lbp and svm based traffic density estimation at intersection , author=. 2019 IEEE Pune section international conference (PuneCon) , pages=. 2019 , organization=

  49. [49]

    Proceedings of the IEEE conference on Computer Vision and Pattern Recognition , pages=

    A closer look at spatiotemporal convolutions for action recognition , author=. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition , pages=. 2018 , organization=

  50. [50]

    Artificial intelligence , volume=

    Determining optical flow , author=. Artificial intelligence , volume=. 1981 , publisher=

  51. [51]

    Two-Frame Motion Estimation Based on Polynomial Expansion , volume =

    Farnebäck, Gunnar , year =. Two-Frame Motion Estimation Based on Polynomial Expansion , volume =. In: Image analysis , doi =

  52. [52]

    and Singh, Sameer , title =

    Rodriguez-Serrano, Jose A. and Singh, Sameer , title =. Pattern Anal. Appl. , month = nov, pages =. 2012 , issue_date =. doi:10.1007/s10044-012-0269-7 , abstract =

  53. [53]

    IEEE International Conference on Image Processing 2005 , volume=

    Similarity based vehicle trajectory clustering and anomaly detection , author=. IEEE International Conference on Image Processing 2005 , volume=. 2005 , organization=

  54. [54]

    proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

    Quo vadis, action recognition? a new model and the kinetics dataset , author=. proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=. 2017 , organization=

  55. [55]

    Proceedings of the IEEE international conference on computer vision , pages=

    Learning spatiotemporal features with 3d convolutional networks , author=. Proceedings of the IEEE international conference on computer vision , pages=. 2015 , organization=

  56. [56]

    Advances in neural information processing systems , volume=

    Two-stream convolutional networks for action recognition in videos , author=. Advances in neural information processing systems , volume=

  57. [57]

    Proceedings of the European conference on computer vision (ECCV) , pages=

    Cbam: Convolutional block attention module , author=. Proceedings of the European conference on computer vision (ECCV) , pages=. 2018 , organization=

  58. [58]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Non-local neural networks , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=. 2018 , organization=

  59. [59]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Optical flow guided feature: A fast and robust motion representation for video action recognition , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=. 2018 , organization=

  60. [60]

    Proceedings of the IEEE international conference on computer vision , pages=

    Flow-guided feature aggregation for video object detection , author=. Proceedings of the IEEE international conference on computer vision , pages=. 2017 , organization=

  61. [61]

    Proceedings of the AAAI Conference on Artificial Intelligence , author=

    Motion Guided Spatial Attention for Video Captioning , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2019 , month=. doi:10.1609/aaai.v33i01.33018191 , abstractNote=

  62. [62]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Long-term recurrent convolutional networks for visual recognition and description , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=. 2015 , organization=

  63. [63]

    IEEE Transactions on Circuits and Systems for Video Technology , volume=

    Two-stream collaborative learning with spatial-temporal attention for video classification , author=. IEEE Transactions on Circuits and Systems for Video Technology , volume=. 2018 , publisher=

  64. [64]

    arXiv preprint arXiv:2012.08510 , year=

    Gta: Global temporal attention for video action understanding , author=. arXiv preprint arXiv:2012.08510 , year=

  65. [65]

    Proceedings of the Royal Society of London

    The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis , author=. Proceedings of the Royal Society of London. Series A: mathematical, physical and engineering sciences , volume=. 1998 , publisher=

  66. [66]

    Sustainability , VOLUME =

    Rui, Yikang and Gong, Yannan and Zhao, Yan and Luo, Kaijie and Lu, Wenqi , TITLE =. Sustainability , VOLUME =. 2024 , NUMBER =

  67. [67]

    2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) , volume=

    Probabilistic kernels for the classification of auto-regressive visual processes , author=. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) , volume=. 2005 , organization=

  68. [68]

    2025 , institution=

    2025 urban mobility report , author=. 2025 , institution=

  69. [69]

    Inrix global traffic scorecard , author=

  70. [70]

    2016 , publisher=

    Traffic Monitoring Guide , author=. 2016 , publisher=

  71. [71]

    2006 , institution=

    Traffic detector handbook: Volume I , author=. 2006 , institution=

  72. [72]

    A survey on Hilbert-Huang transform: Evolution, challenges and solutions , journal =

    Uender Barbosa de Souza and João Paulo Lemos Escola and Leonardo da Cunha Brito , keywords =. A survey on Hilbert-Huang transform: Evolution, challenges and solutions , journal =. 2022 , issn =. doi:https://doi.org/10.1016/j.dsp.2021.103292 , url =

  73. [73]

    Expert Systems with Applications , volume=

    MobileNetV2 with Spatial Attention module for traffic congestion recognition in surveillance images , author=. Expert Systems with Applications , volume=. 2024 , publisher=

  74. [74]

    Engineering Applications of Artificial Intelligence , volume=

    Traffic congestion recognition based on convolutional neural networks in different scenarios , author=. Engineering Applications of Artificial Intelligence , volume=. 2025 , publisher=

  75. [75]

    Transport and Telecommunication , volume=

    Efficient road traffic video congestion classification based on the multi-head self-attention vision transformer model , author=. Transport and Telecommunication , volume=. 2024 , publisher=

  76. [76]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Mitigating and evaluating static bias of action representations in the background and the foreground , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  77. [77]

    arXiv preprint arXiv:2512.17953 , year=

    Seeing Beyond the Scene: Analyzing and Mitigating Background Bias in Action Recognition , author=. arXiv preprint arXiv:2512.17953 , year=

  78. [78]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Gmflow: Learning optical flow via global matching , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  79. [79]

    Sensors , volume=

    Deep learning-based congestion detection at urban intersections , author=. Sensors , volume=. 2021 , publisher=

  80. [80]

    Transportation Research Part C: Emerging Technologies , volume=

    Two-stream video-based deep learning model for crashes and near-crashes , author=. Transportation Research Part C: Emerging Technologies , volume=. 2024 , publisher=

Showing first 80 references.