VideoAgent: Personalized Synthesis of Scientific Videos

Xiao Liang , Bangxin Li , Zixuan Chen , Hanyue Zheng , Zhi Ma , Di Wang , Cong Tian , Quan Wang

Authors on Pith no claims yet

classification 💻 cs.AI

keywords synthesisvideoagentmultimodalscientificautomatednarrationnarrativestatic

read the original abstract

The technical complexity of research papers often limits their reach, necessitating more accessible formats like scientific videos to disseminate key insights through engaging narration. However, existing automated methods primarily focus on static posters or slide presentations that remain template-bound and linear. Shifting to audience-adaptive video synthesis requires addressing non-linear narrative orchestration and the joint synchronization of disparate multimodal assets. We introduce VideoAgent, a modular framework that redefines scientific video synthesis as an intent-driven planning problem. By decoupling content understanding from multimodal synthesis, VideoAgent adaptively interleaves static slides with dynamic animations to match the semantic density of the narration. We further propose SciVidEval, a benchmark evaluating multimodal quality and pedagogical utility through automated metrics and human knowledge transfer studies. Extensive experiments demonstrate that VideoAgent effectively conveys complex technical logic with high narrative fidelity and communicative impact.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

PresentAgent-2: Towards Generalist Multimodal Presentation Agents
cs.CV 2026-05 unverdicted novelty 6.0

PresentAgent-2 generates query-driven multimodal presentation videos with research grounding, supporting single-speaker, multi-speaker discussion, and interactive question-answering modes.