Crayotter introduces a traceable three-phase multi-agent workflow for long-form video editing that scores 3.40/5 in human evaluations, outperforming two baselines on 23 themes.
A Benchmark and Multi-Agent System for Instruction-driven Cinematic Video Compilation
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
The surging demand for adapting long-form cinematic content into short videos has motivated the need for versatile automatic video compilation systems. However, existing compilation methods are limited to predefined tasks, and the community lacks a comprehensive benchmark to evaluate the cinematic compilation. To address this, we introduce CineBench, the first benchmark for instruction-driven cinematic video compilation, featuring diverse user instructions and high-quality ground-truth compilations annotated by professional editors. To overcome contextual collapse and temporal fragmentation, we present CineAgents, a multi-agent system that reformulates cinematic video compilation into ``design-and-compose'' paradigm. CineAgents performs script reverse-engineering to construct a hierarchical narrative memory to provide multi-level context and employs an iterative narrative planning process that refines a creative blueprint into a final compiled script. Extensive experiments demonstrate that CineAgents significantly outperforms existing methods, generating compilations with superior narrative coherence and logical coherence.
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Crayotter: Traceable Multi-Agent Workflows for Long-Form Video Editing
Crayotter introduces a traceable three-phase multi-agent workflow for long-form video editing that scores 3.40/5 in human evaluations, outperforming two baselines on 23 themes.