Proceedings of the IEEE/CVF international conference on computer vision , pages=

Slowfast networks for video recognition , author=

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

representative citing papers

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding

cs.CV · 2026-05-11 · unverdicted · novelty 8.0

EgoMemReason is a new benchmark showing that even the best multimodal models achieve only 39.6% accuracy on reasoning tasks that require integrating sparse evidence across days in egocentric video.

EyeCue: Driver Cognitive Distraction Detection via Gaze-Empowered Egocentric Video Understanding

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

EyeCue detects driver cognitive distraction by modeling gaze-visual context interactions in egocentric videos and achieves 74.38% accuracy on the new CogDrive dataset, outperforming 11 baselines.

LLaVA-Video: Video Instruction Tuning With Synthetic Data

cs.CV · 2024-10-03 · unverdicted · novelty 6.0

LLaVA-Video-178K is a new synthetic video instruction dataset that, when combined with existing data to train LLaVA-Video, produces strong results on video understanding benchmarks.

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

cs.CV · 2023-10-03 · unverdicted · novelty 6.0

LanguageBind aligns video, infrared, depth, and audio to a frozen language encoder via contrastive learning on the new VIDAL-10M dataset, extending video-language pretraining to N modalities.

MER-DG: Modality-Entropy Regularization for Multimodal Domain Generalization

cs.LG · 2026-05-03 · unverdicted · novelty 5.0

MER-DG applies modality-entropy regularization to reduce fusion overfitting in multimodal domain generalization, reporting average gains of 5% over standard fusion and 2% over prior methods on EPIC-Kitchens and HAC benchmarks.

Hybrid Congestion Classification Framework Using Flow-Guided Attention and Empirical Mode Decomposition

cs.CV · 2026-05-06 · unverdicted · novelty 3.0

FLO-EMD integrates flow-guided attention and EMD on aggregated motion traces to classify light, medium, and heavy congestion at 97.5% accuracy on 1,050 surveillance clips.

citing papers explorer

Showing 6 of 6 citing papers.

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding cs.CV · 2026-05-11 · unverdicted · none · ref 59
EgoMemReason is a new benchmark showing that even the best multimodal models achieve only 39.6% accuracy on reasoning tasks that require integrating sparse evidence across days in egocentric video.
EyeCue: Driver Cognitive Distraction Detection via Gaze-Empowered Egocentric Video Understanding cs.CV · 2026-05-08 · unverdicted · none · ref 26
EyeCue detects driver cognitive distraction by modeling gaze-visual context interactions in egocentric videos and achieves 74.38% accuracy on the new CogDrive dataset, outperforming 11 baselines.
LLaVA-Video: Video Instruction Tuning With Synthetic Data cs.CV · 2024-10-03 · unverdicted · none · ref 212
LLaVA-Video-178K is a new synthetic video instruction dataset that, when combined with existing data to train LLaVA-Video, produces strong results on video understanding benchmarks.
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment cs.CV · 2023-10-03 · unverdicted · none · ref 136
LanguageBind aligns video, infrared, depth, and audio to a frozen language encoder via contrastive learning on the new VIDAL-10M dataset, extending video-language pretraining to N modalities.
MER-DG: Modality-Entropy Regularization for Multimodal Domain Generalization cs.LG · 2026-05-03 · unverdicted · none · ref 41
MER-DG applies modality-entropy regularization to reduce fusion overfitting in multimodal domain generalization, reporting average gains of 5% over standard fusion and 2% over prior methods on EPIC-Kitchens and HAC benchmarks.
Hybrid Congestion Classification Framework Using Flow-Guided Attention and Empirical Mode Decomposition cs.CV · 2026-05-06 · unverdicted · none · ref 97
FLO-EMD integrates flow-guided attention and EMD on aggregated motion traces to classify light, medium, and heavy congestion at 97.5% accuracy on 1,050 surveillance clips.

Proceedings of the IEEE/CVF international conference on computer vision , pages=

fields

years

verdicts

representative citing papers

citing papers explorer