Sound Sparks Motion is a test-time tuning approach that adjusts audio and text conditioning signals in multimodal video models using VLM feedback to produce specific motion edits while preserving content.
Instructvid2vid: Controllable video editing with natural language instructions
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
A 30B-parameter transformer and related models generate high-quality videos and audio, claiming state-of-the-art results on text-to-video, video editing, personalization, and audio generation tasks.
The paper reviews the background, technology, applications, limitations, and future directions of OpenAI's Sora text-to-video generative model based on public information.
citing papers explorer
-
Sound Sparks Motion: Audio and Text Tuning for Video Editing
Sound Sparks Motion is a test-time tuning approach that adjusts audio and text conditioning signals in multimodal video models using VLM feedback to produce specific motion edits while preserving content.
-
Movie Gen: A Cast of Media Foundation Models
A 30B-parameter transformer and related models generate high-quality videos and audio, claiming state-of-the-art results on text-to-video, video editing, personalization, and audio generation tasks.
-
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
The paper reviews the background, technology, applications, limitations, and future directions of OpenAI's Sora text-to-video generative model based on public information.