State-of-the-art pose-based video anomaly detection models achieve over 52% frame-level AUC-ROC but drop below 10% event-level precision and 0.11 average F1 when evaluated with temporal action localization metrics on standard benchmarks.
Human-centric video anomaly detection through spatio-temporal pose tokenization and transformer
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2representative citing papers
Zero-shot MLLMs on ShanghaiTech and CHAD exhibit strong conservative bias with high precision but collapsed recall; class-specific prompts raise peak F1 from 0.09 to 0.64 yet recall remains the bottleneck.
citing papers explorer
-
From Frames to Events: Rethinking Evaluation in Human-Centric Video Anomaly Detection
State-of-the-art pose-based video anomaly detection models achieve over 52% frame-level AUC-ROC but drop below 10% event-level precision and 0.11 average F1 when evaluated with temporal action localization metrics on standard benchmarks.
-
Are Multimodal LLMs Ready for Surveillance? A Reality Check on Zero-Shot Anomaly Detection in the Wild
Zero-shot MLLMs on ShanghaiTech and CHAD exhibit strong conservative bias with high precision but collapsed recall; class-specific prompts raise peak F1 from 0.09 to 0.64 yet recall remains the bottleneck.