EgoSafetyBench shows VLMs reliably spot hazard-containing videos but miss specific contextual hazards and are degraded by misleading in-scene text.
Video- safetybench: A benchmark for safety evaluation of video lvlms.arXiv preprint arXiv:2505.11842, 2025
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Introduces LivingScreen benchmark for living-screen-native GUI agents on short-video platforms; frontier models fail to match human cost-accuracy due to over- and under-observation.
UNIVID generates policy-aware captions for video moderation, reducing violation leakage by 42.7% and overkill rate by 37.0% while replacing over 1,000 policy-specific models with a single backbone.
citing papers explorer
-
EgoSafetyBench: A Diagnostic Egocentric Video Benchmark for Evaluating Embodied VLMs as Runtime Safety Guards
EgoSafetyBench shows VLMs reliably spot hazard-containing videos but miss specific contextual hazards and are degraded by misleading in-scene text.
-
Benchmarking Living-Screen-Native GUI Agents on Short-Video Platforms
Introduces LivingScreen benchmark for living-screen-native GUI agents on short-video platforms; frontier models fail to match human cost-accuracy due to over- and under-observation.
-
UNIVID: Unified Vision-Language Model for Video Moderation
UNIVID generates policy-aware captions for video moderation, reducing violation leakage by 42.7% and overkill rate by 37.0% while replacing over 1,000 policy-specific models with a single backbone.