Short film dataset (sfd): A benchmark for story- level video understanding.arXiv preprint arXiv:2406.10221,

Ridouane Ghermi, Xi Wang, Vicky Kalogeiton, Ivan Laptev · 2024 · arXiv 2406.10221

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

dataset 1

citation-polarity summary

background 1

representative citing papers

InstructAV2AV: Instruction-Guided Audio-Video Joint Editing

cs.CV · 2026-05-18 · unverdicted · novelty 7.0

InstructAV2AV is an end-to-end instruction-guided audio-video joint editing model that adapts a pre-trained backbone with gated attention and two-stage training, outperforming prior methods on 11 metrics after building the InsAVE-80K dataset.

AVI-Edit: Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner

cs.CV · 2025-12-11 · unverdicted · novelty 7.0

AVI-Edit enables precise audio-synchronized instance-level video editing via a granularity-aware mask refiner, a self-feedback audio agent, and a new large-scale annotated dataset.

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

cs.CV · 2025-01-06 · unverdicted · novelty 6.0

MotionBench is a new benchmark showing poor fine-grained motion understanding in VLMs and proposes TE Fusion to improve performance with higher frame rates.

Looking Beyond the Obvious: A Survey on Abstract Concept Recognition for Video Understanding

cs.CV · 2025-08-28 · unverdicted · novelty 3.0

A literature survey on abstract concept recognition in videos that catalogs prior tasks and datasets while advocating for foundation models and reuse of decades of community experience.

citing papers explorer

Showing 4 of 4 citing papers.

InstructAV2AV: Instruction-Guided Audio-Video Joint Editing cs.CV · 2026-05-18 · unverdicted · none · ref 7
InstructAV2AV is an end-to-end instruction-guided audio-video joint editing model that adapts a pre-trained backbone with gated attention and two-stage training, outperforming prior methods on 11 metrics after building the InsAVE-80K dataset.
AVI-Edit: Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner cs.CV · 2025-12-11 · unverdicted · none · ref 24
AVI-Edit enables precise audio-synchronized instance-level video editing via a granularity-aware mask refiner, a self-feedback audio agent, and a new large-scale annotated dataset.
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models cs.CV · 2025-01-06 · unverdicted · none · ref 9
MotionBench is a new benchmark showing poor fine-grained motion understanding in VLMs and proposes TE Fusion to improve performance with higher frame rates.
Looking Beyond the Obvious: A Survey on Abstract Concept Recognition for Video Understanding cs.CV · 2025-08-28 · unverdicted · none · ref 125
A literature survey on abstract concept recognition in videos that catalogs prior tasks and datasets while advocating for foundation models and reuse of decades of community experience.

Short film dataset (sfd): A benchmark for story- level video understanding.arXiv preprint arXiv:2406.10221,

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer