A Benchmark and Multi-Agent System for Instruction-driven Cinematic Video Compilation

· 2026 · cs.CV · arXiv 2604.10456

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

The surging demand for adapting long-form cinematic content into short videos has motivated the need for versatile automatic video compilation systems. However, existing compilation methods are limited to predefined tasks, and the community lacks a comprehensive benchmark to evaluate the cinematic compilation. To address this, we introduce CineBench, the first benchmark for instruction-driven cinematic video compilation, featuring diverse user instructions and high-quality ground-truth compilations annotated by professional editors. To overcome contextual collapse and temporal fragmentation, we present CineAgents, a multi-agent system that reformulates cinematic video compilation into ``design-and-compose'' paradigm. CineAgents performs script reverse-engineering to construct a hierarchical narrative memory to provide multi-level context and employs an iterative narrative planning process that refines a creative blueprint into a final compiled script. Extensive experiments demonstrate that CineAgents significantly outperforms existing methods, generating compilations with superior narrative coherence and logical coherence.

representative citing papers

Crayotter: Traceable Multi-Agent Workflows for Long-Form Video Editing

cs.CV · 2026-05-31 · unverdicted · novelty 6.0

Crayotter introduces a traceable three-phase multi-agent workflow for long-form video editing that scores 3.40/5 in human evaluations, outperforming two baselines on 23 themes.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Crayotter: Traceable Multi-Agent Workflows for Long-Form Video Editing cs.CV · 2026-05-31 · unverdicted · none · ref 33 · internal anchor
Crayotter introduces a traceable three-phase multi-agent workflow for long-form video editing that scores 3.40/5 in human evaluations, outperforming two baselines on 23 themes.

A Benchmark and Multi-Agent System for Instruction-driven Cinematic Video Compilation

fields

years

verdicts

representative citing papers

citing papers explorer