Video- safetybench: A benchmark for safety evaluation of video lvlms.arXiv preprint arXiv:2505.11842, 2025

Xuannan Liu, Zekun Li, Zheqi He, Peipei Li, Shuhan Xia, Xing Cui, Huaibo Huang, Xi Yang, Ran He · 2025 · arXiv 2505.11842

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

EgoSafetyBench: A Diagnostic Egocentric Video Benchmark for Evaluating Embodied VLMs as Runtime Safety Guards

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

EgoSafetyBench shows VLMs reliably spot hazard-containing videos but miss specific contextual hazards and are degraded by misleading in-scene text.

Benchmarking Living-Screen-Native GUI Agents on Short-Video Platforms

cs.CV · 2026-06-03 · unverdicted · novelty 7.0

Introduces LivingScreen benchmark for living-screen-native GUI agents on short-video platforms; frontier models fail to match human cost-accuracy due to over- and under-observation.

UNIVID: Unified Vision-Language Model for Video Moderation

cs.MM · 2026-06-04 · unverdicted · novelty 4.0

UNIVID generates policy-aware captions for video moderation, reducing violation leakage by 42.7% and overkill rate by 37.0% while replacing over 1,000 policy-specific models with a single backbone.

citing papers explorer

Showing 3 of 3 citing papers after filters.

EgoSafetyBench: A Diagnostic Egocentric Video Benchmark for Evaluating Embodied VLMs as Runtime Safety Guards cs.CV · 2026-06-30 · unverdicted · none · ref 19
EgoSafetyBench shows VLMs reliably spot hazard-containing videos but miss specific contextual hazards and are degraded by misleading in-scene text.
Benchmarking Living-Screen-Native GUI Agents on Short-Video Platforms cs.CV · 2026-06-03 · unverdicted · none · ref 35
Introduces LivingScreen benchmark for living-screen-native GUI agents on short-video platforms; frontier models fail to match human cost-accuracy due to over- and under-observation.
UNIVID: Unified Vision-Language Model for Video Moderation cs.MM · 2026-06-04 · unverdicted · none · ref 93
UNIVID generates policy-aware captions for video moderation, reducing violation leakage by 42.7% and overkill rate by 37.0% while replacing over 1,000 policy-specific models with a single backbone.

Video- safetybench: A benchmark for safety evaluation of video lvlms.arXiv preprint arXiv:2505.11842, 2025

fields

years

verdicts

representative citing papers

citing papers explorer