V2M-Zero achieves state-of-the-art video-to-music generation with improved temporal synchronization and semantic alignment by substituting video event curves into a fine-tuned text-to-music model without any paired training data.
Simple and controllable music generation
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
V2M-Zero: Zero-Pair Time-Aligned Video-to-Music Generation
V2M-Zero achieves state-of-the-art video-to-music generation with improved temporal synchronization and semantic alignment by substituting video event curves into a fine-tuned text-to-music model without any paired training data.